如何使用httpClient编码gzip来获取页面源代码？

我正在使用commons-httpclient 3.1来读取html页面源代码。除了内容编码为gzip的页面外，它工作正常。我收到不完整的页面源代码。如何使用httpClient编码gzip来获取页面源代码？

对于该页面firefox显示内容编码为gzip。

下面是详细内容

响应头：

status code: HTTP/1.1 200 OK 
Date = Wed, 20 Jul 2011 11:29:38 GMT 
Content-Type = text/html; charset=UTF-8 
X-Powered-By = JSF/1.2 
Set-Cookie = JSESSIONID=Zqq2Tm8V74L1LJdBzB5gQzwcLQFx1khXNvcnZjNFsQtYw41J7JQH!750321853; path=/; HttpOnly 
Transfer-Encoding = chunked 
Content- length =-1

我的代码读取响应：

HttpClient httpclient = new HttpClient(); 
      httpclient.getParams().setParameter("http.connection.timeout", 
        new Integer(50000000)); 
      httpclient.getParams().setParameter("http.socket.timeout", 
        new Integer(50000000)); 


     // Create a method instance. 
     GetMethod method = new GetMethod(url); 



     // Provide custom retry handler is necessary 
     method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
       new DefaultHttpMethodRetryHandler(3, false)); 
     BufferedReader reader = null; 
      // Execute the method. 
      int statusCode = httpclient.executeMethod(method); 

      if (statusCode != HttpStatus.SC_OK) { 
       System.err.println("Method failed: " 
         + method.getStatusLine()); 
       strHtmlContent = null; 
      } else { 


       InputStream is = method.getResponseBodyAsStream(); 
       reader = new BufferedReader(new InputStreamReader(is,"ISO8859_8")); 
       String line = null; 
       StringBuffer sbResponseBody = new StringBuffer(); 
       while ((line = reader.readLine()) != null) { 
        sbResponseBody.append(line).append("\n"); 
       } 
       strHtmlContent = sbResponseBody.toString();

来源

2011-07-20 mahesh

升级到4.1的HttpClient。它应该支持无缝压缩。

来源

2011-07-20 11:59:01 pap

感谢您的回复。我尝试通过使用httpclient 4.1，我没有得到gzip格式异常。 – mahesh

好奇。您在问题中发布的标题部分实际上并未指定gzip编码。你确定它确实是吗？ – pap

虽然尝试我得到了以下回应：---------------------------------------- 回应是gzip编码 ---------------------------------------- Date = Fri，22 Jul 2011 07:58:44 GMT Content-Encoding = gzip Content-Length = 5856 Content-Type = text/html; charset = UTF-8 X-Powered-By = JSF/1.2 Set-Cookie = JSESSIONID = 9D2hTptKQ1PqKsMvHcYLyFTVlQ6fTNWK3VtcQcVmBHqFb9fSbvYL！750321853;路径= /; HttpOnly 内容长度= -1 内容编码=空致命传输错误：未使用GZIP格式 java.io.IOException：未使用GZIP格式 – mahesh

我只是发生在这个问题上，我解决如下：

URL url = new URL("http://www.megadevs.com"); 
    HttpURLConnection conn = (HttpURLConnection) url.openConnection(); 

    GZIPInputStream gzip = new GZIPInputStream(conn.getInputStream()); 
    int value = -1; 
    String page = ""; 

    while ((value = gzip.read()) != -1) { 
     char c = (char) value; 
     page += c; 
    } 
    gzip.close();

希望这有助于。

来源

2012-06-04 14:11:07 Sebastiano

如何使用httpClient编码gzip来获取页面源代码？

回答

相关问题