2013-08-24 42 views
0

我试图从网站下载数据到数据表。问题是我无法访问正确的节点,因为似乎有空白的空间。这是我到目前为止的代码:像声明或删除html敏捷包中的尾随空白?

 public static DataTable downloadtable() 
    { 
     DataTable dt = new DataTable(); 
     string htmlCode = ""; 
     using (WebClient client = new WebClient()) 
     { 
      client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError"); 
      htmlCode = client.DownloadString("https://www.eex.com/en/Market%20Data/Trading%20Data/Power/Hour%20Contracts%20%7C%20Spot%20Hourly%20Auction/Area%20Prices/spot-hours-area-table/2013-08-22"); 
     } 
     //this is just to check the file structure from text file 
     System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\temp\\test.txt"); 
     file.WriteLine(htmlCode); 

     HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 

     doc.LoadHtml(htmlCode); 

     dt = new DataTable(); 

     foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[@class='list electricity']/tr/th[@class='title'][.='Market Area']")) 
     { 
      //This is the problem name where I get the error 
      foreach (HtmlNode row in table.SelectNodes("//td[@class='title'][.='   00-01   ']")) 
      { 

         foreach (var cell in row.SelectNodes("//td")) 
         { 
           //this is to check for correct result, final result would be to dump it into datatable 
           Console.WriteLine(cell.InnerText);        
         } 
      } 
     } 
     return dt; 
    } 

余米试图从代码的链接下载的时间价格,但似乎因为尾随空白的失败(我认为)。 有没有像节点名称的声明?或者你能删除尾随空白吗?

回答

1

我相信你的问题是,你试图从td节点中检索td的节点,该节点显然没有更多td的节点。

<tr> 
<td class="title">   00-01   </td> 
<td class="spacer"></td> 
<td class="r">€/MWh</td> 
<td class="spacer"></td> 
<td>35.34</td> 
<td class="spacer"></td> 
<td>34.02</td> 
<td class="spacer"></td> 
<td>34.02</td> 
</tr> 

所以,如果你尝试用你的结果table.SelectNodes("//td[@class='title'][.=' 00-01 ']")遍历它不包含任何TD对它的内部。

如果你想所有的行从00-01开始,你可以用这一个:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(htmlCode); 
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]/ancestor::table")) 
{ 
    foreach (var cell in row.SelectNodes("./tr/td")) 
    { 
     if (string.IsNullOrEmpty(cell.InnerText.Trim())) 
      continue; 
     Console.WriteLine(cell.InnerText.Trim()); 
    } 
} 

如果只想00-01行,你可以用这一个:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(htmlCode); 
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//td[@class='title']")) 
{ 
    if (row.InnerText.Trim() == "00-01") 
    { 
     foreach (var cell in row.ParentNode.ChildNodes) 
     { 
      if (string.IsNullOrEmpty(cell.InnerText.Trim())) 
       continue; 
      Console.WriteLine(cell.InnerText.Trim()); 
     } 
    } 
} 

或者您可以将其作为:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(htmlCode); 
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]")) 
{ 
    foreach (var cell in row.ParentNode.ChildNodes) 
    { 
     if (string.IsNullOrEmpty(cell.InnerText.Trim())) 
      continue; 
     Console.WriteLine(cell.InnerText.Trim()); 
    } 
}