2014-06-20 54 views
2

我试图从
http://www.dsebd.org/displayCompany.php?name=NBL
提取单个数据I显示在附画面,其 Xpath的所需字段从网页数据:/ HTML /体/表[2]/tbody/tr/td [2]/table/tbody/tr [3]/td 1/p 1 /表1/tbody/tr/td 1/table/tbody/tr [2]/td [2]/font试图提取使用HtmlAgilityPack

错误:发生异常,并且使用该Xpath找不到数据。 “类型的未处理的异常 'System.Net.WebException' 发生在HtmlAgilityPack.dll”

enter image description here

源代码:

static void Main(string[] args) 
    { 
     /************************************************************************/ 
     string tickerid = "Bse_Prc_tick"; 
     HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load(@"http://www.dsebd.org/displayCompany.php?name=NBL", "GET"); 

     if (doc != null) 
     { 
      // Fetch the stock price from the Web page 
      string stockprice = doc.DocumentNode.SelectSingleNode(string.Format("./html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td1/p1/table1/tbody/tr/td1/table/tbody/tr[2]/td[2]/font", tickerid)).InnerText; 
      Console.WriteLine(stockprice); 
     } 
     Console.WriteLine("ReadKey Starts........"); 
     Console.ReadKey(); 
} 
+1

确定XPath是正确的? Chrome F12工具显示您标记的字段的不同路径。 – PTwr

+0

我从名为“XPath Helper”的扩展名获得了XPath。它不应该是错的。顺便说一下,我正在检查它。希望我会找到正确的。 @PTwr – Leon

回答

2

嗯,我查过了。我们正在使用的XPath是不正确的。当你试图找到错误所在的位置时,真正的乐趣就开始了。

只是检查出你使用的,除了这阻碍了XPath的它甚至包含多个HTML标签许多错误的网页的源代码

Chrome浏览器开发工具,工具,你用,适用于纠正DOM树通过浏览器(所有打包到单个html节点,添加一些tbody等)。

由于html结构简单地被破坏,所以成为HtmlAgilityPack解析。

根据情况,您既可以使用RegExp,也可以在源中搜索已知元素(速度更快,但敏捷性更低)。

例如:

... 
using System.Net; //required for Webclient 
... 
     class Program 
     { 
      //entry point of console app 
      static void Main(string[] args) 
      { 
       // url to download 
       // "var" means I am too lazy to write "string" and let compiler decide typing 
       var url = @"http://www.dsebd.org/displayCompany.php?name=NBL"; 

       // creating object in using makes Garbage Collector delete it when using block ends, as opposed to standard cleaning after whole function ends 
       using (WebClient client = new WebClient()) // WebClient class inherits IDisposable 
       { 

        // simply download result to string, in this case it will be html code 
        string htmlCode = client.DownloadString(url); 
        // cut html in half op position of "Last Trade:" 
        // searching from beginning of string is easier/faster than searching in middle 
        htmlCode = htmlCode.Substring(
         htmlCode.IndexOf("Last Trade:") 
         ); 
        // select from .. to .. and then remove leading and trailing whitespace characters 
        htmlCode = htmlCode.Substring("2\">", "</font></td>").Trim(); 
        Console.WriteLine(htmlCode); 
       } 
       Console.ReadLine(); 
      } 
     } 
     // http://stackoverflow.com/a/17253735/3147740 <- copied from here 
     // this is Extension Class which adds overloaded Substring() I used in this code, it does what its comments says 
     public static class StringExtensions 
     { 
      /// <summary> 
      /// takes a substring between two anchor strings (or the end of the string if that anchor is null) 
      /// </summary> 
      /// <param name="this">a string</param> 
      /// <param name="from">an optional string to search after</param> 
      /// <param name="until">an optional string to search before</param> 
      /// <param name="comparison">an optional comparison for the search</param> 
      /// <returns>a substring based on the search</returns> 
      public static string Substring(this string @this, string from = null, string until = null, StringComparison comparison = StringComparison.InvariantCulture) 
      { 
       var fromLength = (from ?? string.Empty).Length; 
       var startIndex = !string.IsNullOrEmpty(from) 
        ? @this.IndexOf(from, comparison) + fromLength 
        : 0; 

       if (startIndex < fromLength) { throw new ArgumentException("from: Failed to find an instance of the first anchor"); } 

       var endIndex = !string.IsNullOrEmpty(until) 
       ? @this.IndexOf(until, startIndex, comparison) 
       : @this.Length; 

       if (endIndex < 0) { throw new ArgumentException("until: Failed to find an instance of the last anchor"); } 

       var subString = @this.Substring(startIndex, endIndex - startIndex); 
       return subString; 
      } 
     } 
+0

我编辑了XPath,但它不工作... – Leon

+0

@Leon:我用XPath的“固定”问题,请参阅编辑的文章。 – PTwr

+0

它的完美作品。感谢您宝贵的时间。原谅我的无知,但我的代码对我来说有点复杂,因为我是一个新的学习者。我发现XPath更简单一些。谢谢你的方式。它会工作,我会学习这个方法。 @PTwr – Leon

0

裹在的try-catch你的代码,以获得更多信息例外。

相关问题