2013-05-30 42 views
-1

我想读在C#与下面的代码HTTPS URL的HTML源代码阅读HTML源代码:如何从HTTPS URL

WebClient webClient = new WebClient(); 
string htmlString = w.DownloadString("https://www.targetUrl.com"); 

enter image description here

这不适合工作我,因为我得到编码的HTML字符串。我尝试使用HtmlAgilityPack,但没有任何帮助。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(htmlString); 
+1

这是什么意思'这并没有为我工作,我得到编码的HTML string'? – I4V

+0

表示它不适用于HTTPS链接https://www.targetUrl.com –

+0

WebClient.DownloadString'不需要执行任何特殊的操作来从https地址下载。你是什​​么意思“编码”?你怎么知道它的编码?它是什么样子的? – Snixtor

回答

3

该URL返回一个gzip压缩的字符串。 WebClient默认情况下不支持此功能,因此您需要改为下面的HttpWebRequest类。答案公然敲竹杠由费罗兹看过来 - Automatically decompress gzip response via WebClient.DownloadData

class MyWebClient : WebClient 
{ 
    protected override WebRequest GetWebRequest(Uri address) 
    { 
     HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest; 
     request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip; 
     return request; 
    } 
} 
+0

是的,它也适用于http://example.com网址,但不适用于https://example.com –

+0

@kavitaverma,然后用'WebClient.DownloadData'下载页面并自行解压缩。 – I4V

0
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; }; 
WebClient webClient = new WebClient(); 
string htmlString = w.DownloadString(url);