2010-02-11 55 views

回答

6

我在做什么现在的问题是:

public static final HashMap<String, String> acceptTypes = new HashMap<String, String>(){{ 
     put("html", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); 
     put("img", "image/png,image/*;q=0.8,*/*;q=0.5"); 
     put("script", "*/*"); 
     put("style", "text/css,*/*;q=0.1"); 
    }}; 

protected void downloadCssAndImages(HtmlPage page) { 
     String xPathExpression = "//*[name() = 'img' or name() = 'link' and @type = 'text/css']"; 
     List<?> resultList = page.getByXPath(xPathExpression); 

     Iterator<?> i = resultList.iterator(); 
     while (i.hasNext()) { 
      try { 
       HtmlElement el = (HtmlElement) i.next(); 

       String path = el.getAttribute("src").equals("")?el.getAttribute("href"):el.getAttribute("src"); 
       if (path == null || path.equals("")) continue; 

       URL url = page.getFullyQualifiedUrl(path); 

       WebRequestSettings wrs = new WebRequestSettings(url); 
       wrs.setAdditionalHeader("Referer", page.getWebResponse().getRequestSettings().getUrl().toString()); 

       client.addRequestHeader("Accept", acceptTypes.get(el.getTagName().toLowerCase())); 
       client.getPage(wrs); 
      } catch (Exception e) {} 
     } 



client.removeRequestHeader("Accept"); 
} 
0

HtmlUnit不下载CSS或图像。他们是没用的,一个无头的浏览器...

最后我听说它是​​在这里,但票被标记为私人:http://osdir.com/ml/java.htmlunit.devel/2007-01/msg00021.html

+1

如果什么用户想要查看这些cookie CSS或无头浏览器的图像?这似乎是这个问题所暗示的。我猜css和图像不会是无用的,对吧?事实上,这就是导致我这个问题的原因,如果我可以使用无头浏览器根据大小或哈希或CSS来检查背景颜色的值,那将会很好。试图在这里提供帮助......你的回答有点争论而不是建设性的。 – fooMonster 2011-09-15 12:51:50

1

来源:How to get base64 encoded contents for an ImageReader?

HtmlImage img = (HtmlImage) p.getByXPath("//img").get(3); 
ImageReader imageReader = img.getImageReader(); 
BufferedImage bufferedImage = imageReader.read(0); 
String formatName = imageReader.getFormatName(); 
ByteArrayOutputStream byteaOutput = new ByteArrayOutputStream(); 
Base64OutputStream base64Output = new base64OutputStream(byteaOutput); 
ImageIO.write(bufferedImage, formatName, base64output); 
String base64 = new String(byteaOutput.toByteArray()); 
1

这就是我想出了:

public InputStream httpGetLowLevel(URL url) throws IOException 
{ 
    WebRequest wrq=new WebRequest(url); 

    ProxyConfig config =webClient.getProxyConfig(); 

    //set request webproxy 
    wrq.setProxyHost(config.getProxyHost()); 
    wrq.setProxyPort(config.getProxyPort()); 
    wrq.setCredentials(webClient.getCredentialsProvider().getCredentials(new AuthScope(config.getProxyHost(), config.getProxyPort()))); 
    for(Cookie c:webClient.getCookieManager().getCookies(url)){ 
     wrq.setAdditionalHeader("Cookie", c.toString());    
    }   
    WebResponse wr= webClient.getWebConnection().getResponse(wrq); 
    return wr.getContentAsStream(); 
} 

我的测试表明,它不支持proxys,它不仅承载来自Web客户端的cookie,而且如果服务器的响应中发送新的cookies,Web客户端会吃

相关问题