2012-09-11 20 views
0

我有这样的代码:我得到WebException上的url为什么是这样的?

private List<string> webCrawler(string url, int levels) 
     { 
      HtmlAgilityPack.HtmlDocument doc; 
      HtmlWeb hw = new HtmlWeb(); 
      List<string> webSites; 
      List<string> csFiles = new List<string>(); 

      csFiles.Add("temp string to know that something is happening in level = " + levels.ToString()); 
      csFiles.Add("current site name in this level is : "+url); 

      doc = hw.Load(url); 
      webSites = getLinks(doc); 


      if (levels == 0) 
      { 
       return csFiles; 
      } 
      else 
      { 
       int actual_sites = 0; 
       for (int i = 0; i < webSites.Count() && i< 20; i++)     { 
        string t = webSites[i]; 
             if ((t.StartsWith("http://")==true) || (t.StartsWith("https://")==true))      { 
         actual_sites++; 
         csFiles.AddRange(webCrawler(t, levels - 1)); 
         Texts(richTextBox1, "Level Number " + levels + " " + t + Environment.NewLine, Color.Red); 
        } 
       } 

       return csFiles; 
      } 


     } 

而且getLinks()是:

private List<string> getLinks(HtmlAgilityPack.HtmlDocument document) 
     { 

      List<string> mainLinks = new List<string>(); 
      var linkNodes = document.DocumentNode.SelectNodes("//a[@href]"); 
      if (linkNodes != null) 
      { 
       foreach (HtmlNode link in linkNodes) 
       { 
        var href = link.Attributes["href"].Value; 
        mainLinks.Add(href); 
       } 
      } 
      return mainLinks; 

     } 

问题是,比如我爬进google.com如此几次后,其获得的网站:

http://picasa.google.co.il/intl/iw/#utm_source=iw-all-more&amp;utm_campaign=iw-pic&amp;utm_medium=et

然后即时得到就行了异常:

doc = hw.Load(url); 

的错误是:远程名称不能被解析:picasa.google.co.il'

唯一的例外是:

System.Net.WebException was unhandled 
    Message=The remote name could not be resolved: 'picasa.google.co.il' 
    Source=System 
    StackTrace: 
     at System.Net.HttpWebRequest.GetResponse() 
     at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1446 
     at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563 
     at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1152 
     at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107 
     at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 79 
     at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 108 
     at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 108 
     at GatherLinks.Form1..ctor() in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 31 
     at GatherLinks.Program.Main() in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Program.cs:line 18 
     at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args) 
     at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args) 
     at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly() 
     at System.Threading.ThreadHelper.ThreadStart_Context(Object state) 
     at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) 
     at System.Threading.ThreadHelper.ThreadStart() 
    InnerException: 

我怎样才能修复/修复/解决?

谢谢。

+3

尝试打开W​​indows命令提示符并键入:'ping picasa.google.co.il'。然后你会明白为什么。 – Adam

+0

'Message =远程名称无法解析:'picasa.google.co.il'这很明显。 – Icarus

+0

codesparkle和icarus true现在看到这个网站在这种情况下不存在。我应该使用try和catch来处理这种情况吗? –

回答

3

例外情况是告诉您它无法将picasa.google.co.il解析为IP地址。您可能只需要验证名称是否正确。

打开命令窗口,键入:

ping picasa.google.co.il 

你会发现,你的电脑不能与此服务器,因为没有它DNS条目。

+0

Davisoa对,我现在看到,这个网站是不存在的。最好的办法来处理这种情况下使用尝试并赶上行doc = hw.Load(url); ? –

+0

是的,我会在'Load'调用周围放置一个'try ... catch(WebException wex)'。 – davisoa

相关问题