2013-04-11 51 views
0

我正在开发一个GWT过滤器,以使我的GWT应用可以被抓取。这个想法是,当它找到一个丑陋的URL是这样的:GWT过滤器从未执行

http://www.myapp.com/?_escaped_fragment_=v;id=Mv67mC13Yizr

呈现出良好的一个:

http://www.myapp.com/#!v;id=Mv67mC13Yizr

然而,代码永远不会reachs中的doFilter()。为什么?

Web.xml中

<filter> 
    <filter-name>guiceFilter</filter-name> 
    <filter-class>com.google.inject.servlet.GuiceFilter</filter-class> 
</filter> 

<filter-mapping> 
    <filter-name>guiceFilter</filter-name> 
    <url-pattern>/*</url-pattern> 
</filter-mapping> 

DispatchServletModule.java

public class DispatchServletModule extends ServletModule { 

    @Override 
    public void configureServlets() { 
     serve("/" + ActionImpl.DEFAULT_SERVICE_NAME) 
       .with(DispatchServiceImpl.class); 
     filter("/").through(CrawlerServiceImpl.class); 
    } 
} 

CrawlerServiceImpl.java

@Singleton 
    public final class CrawlerServiceImpl implements Filter { 
     private static final String ESCAPED_FRAGMENT_FORMAT1 = "_escaped_fragment_="; 
     private final int ESCAPED_FRAGMENT_LENGTH1 = ESCAPED_FRAGMENT_FORMAT1.length(); 
     private static final String ESCAPED_FRAGMENT_FORMAT2 = "&"+ESCAPED_FRAGMENT_FORMAT1; 
     private final int ESCAPED_FRAGMENT_LENGTH2 = ESCAPED_FRAGMENT_FORMAT2.length(); 

     @Inject(optional = true) 
     private final Provider<WebClient> webClientProvider = null; 

     @Override 
     public void init(FilterConfig filterConfig) throws ServletException { 
     } 
     @Override 
     public void destroy() { 
     } 

     @Override 
     public void doFilter(ServletRequest request, ServletResponse response, 
      FilterChain chain) throws IOException, ServletException { 
     HttpServletRequest req = (HttpServletRequest) request; 
     HttpServletResponse res = (HttpServletResponse) response; 
     String queryString = req.getQueryString(); 

     final String requestURI = req.getRequestURI(); 
     if ((queryString != null) && (queryString.contains(ESCAPED_FRAGMENT_FORMAT1))) { 
      try { 
      StringBuilder pageNameSb = new StringBuilder("http://"); 
      pageNameSb.append(req.getServerName()); 
      if (req.getServerPort() != 0) { 
       pageNameSb.append(":"); 
       pageNameSb.append(req.getServerPort()); 
      } 
      pageNameSb.append(requestURI); 
      queryString = rewriteQueryString(queryString); 
      pageNameSb.append(queryString); 
      String pageName = pageNameSb.toString(); 
      WebClient webClient; 
      if(webClientProvider == null) 
       webClient = new WebClient(BrowserVersion.FIREFOX_3_6); 
      else 
       webClient = webClientProvider.get(); 

      webClient.setThrowExceptionOnScriptError(false); 
      webClient.setJavaScriptEnabled(true); 
      HtmlPage page = webClient.getPage(pageName); 

      res.setContentType("text/html;charset=UTF-8"); 
      PrintWriter out = res.getWriter(); 
      out.println("<hr />"); 
      out.println("<center><h3>You are viewing a non-interactive page that is intended for the crawler. " 
       + "You probably want to see this page: <a href=\"" 
       + pageName 
       + "\">" 
       + pageName + "</a></h3></center>"); 
      out.println("<hr />");  
      out.println(page.asXml()); 
      webClient.closeAllWindows(); 
      out.println(""); 
      out.close(); 
      } 
      catch(Exception e) { 
      } 
     } else { 
      chain.doFilter(request, response); 
     } 
     } 

     private String rewriteQueryString(String queryString) throws UnsupportedEncodingException { 
     int index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT2); 
     int length = ESCAPED_FRAGMENT_LENGTH2; 
     if (index == -1) { 
      index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT1); 
      length = ESCAPED_FRAGMENT_LENGTH1; 
     } 
     if (index != -1) { 
      StringBuilder queryStringSb = new StringBuilder(); 
      if (index > 0) { 
      queryStringSb.append("?"); 
      queryStringSb.append(queryString.substring(0, index)); 
      } 
      queryStringSb.append("#!"); 
      queryStringSb.append(URLDecoder.decode(queryString.substring(index 
       + length, queryString.length()), "UTF-8")); 
      return queryStringSb.toString(); 
     } 
     return queryString; 
     } 
} 

回答

1

<url-pattern>是无效的,*只允许为/*后缀或者图案的*.前缀;而模式只适用于路径,而不是查询字符串。

你有你的过滤器映射到/和过滤器检查为_escaped_fragment_参数中(我personnaly检查getMethod()"GET"然后用getParameter("_escaped_fragment_"))来决定是否使用WebClient抓取和呈现网页服务器端,或只是链接到下一个过滤器。

需要注意的是,你在你的web.xml声明你的过滤器将不会被注入吉斯,所以像Dvd Prd说你可能宁愿声明在Guice的ServletModule过滤器。请注意,与标准映射类似,只有路径匹配,所以上述情况仍然适用(即使filterRegex()也不起作用)。

+0

谢谢。然后,按照您在最后一段中的说法,从上面的代码中,我刚刚从web.xml中删除了这些过滤条目,并将代码Dvd Prd插入到我的DispatchServletModule.java中。但仍然不起作用。请在新代码上方找到。 – Arturo 2013-04-11 12:17:54

+1

正如我所说的,模式只在路径上匹配,所以即使绑定在Guice'ServletModule'上,你必须绑定到'/'并修改你的过滤器来处理查询字符串(我的响应的第一部分) – 2013-04-11 12:55:48

+0

谢谢。我再次更新它:filter(“/”)。through(CrawlerServiceImpl.class);但是,doFilter()中的代码永远不会执行。有任何想法吗? – Arturo 2013-04-11 14:32:40

1

Guice采取所有过滤器。要添加你需要声明它在你的吉斯servlet module过滤器:

filter("/?_escaped_fragment_=*").through(CrawlerServiceImpl.class);

+1

'CrawlerServletFilter'声明并映射'GuiceFilter',以便在'GuiceFilter'之前执行'GuiceFilter'。 – 2013-04-11 11:28:03