2013-03-15 82 views
0

我想从使用html敏捷性包的html表中获取数据,但不断获取第一个表格行中的数据。HtmlAgilityPack从html页面读取数据

的HTML代码,我从阅读如下:

<div id="mainDiv"> 
    <table id="tbl"> 
     <thead> 
      <tr> 
       <th class="tbl_col1">UserName</th> 
       <th class="tbl_col2">Points</th> 
      </tr> 
     </thead> 
     <tbody>  
      <tr data-source="provider1"> 
      <td class="tbl_col1"> 
       <a href="/Users/1090" id="UserLink" target="_blank">UserName1</a>   
      </td> 
      <td class="tbl_col2"> 
       <a href="/UserPoints/1090" id="PointLink" target="_blank">1892 <span class="up_arrow">&nbsp;</span></a>    
      </td>   
      </tr> 
      <tr data-source="provider2"> 
      <td class="tbl_col1"> 
       <a href="/Users/1090" id="UserLink" target="_blank">UserName2</a>   
      </td> 
      <td class="tbl_col2"> 
       <a href="/UserPoints/1090" id="PointLink" target="_blank">3217 <span class="down_arrow">&nbsp;</span></a>    
      </td>   
     </tr> 
     </tbody> 
    </table> 
</div> 

我使用这个代码

var UserTable = htmlDocument.DocumentNode.SelectSingleNode("//div[@id='mainDiv']").SelectSingleNode("//table[@id='tbl']").SelectSingleNode("//tbody").SelectNodes("//tr"); 
foreach (var row in UserTable) 
{ 
    if (row.Attributes["data-source"] != null) 
    { 
     string Source = row.Attributes["data-source"].Value; 
     string UserName = row.SelectSingleNode("td[@class='tbl_col1']").SelectSingleNode("//a[@id='UserLink']/text()").InnerText; 
     string Points = row.SelectSingleNode("td[@class='tbl_col2']").SelectSingleNode("//a[@id='PointLink']/text()").InnerText; 
     Console.WriteLine(Source + "\t" + UserName + "\t" + Points); 
    } 
} 

但我不断收到这样的输出:

provider1  UserName1  1892 
provider2  UserName1  1892 

回答

2

你发错误的假设://a[@id='UserLink']/text()//a[@id='PointLink']/text()在整个文档中搜索。这就是为什么你得到第一个tr节点。只需使用:

string UserName = row.SelectSingleNode("td[@class='tbl_col1']/a[@id='UserLink']/text()").InnerText; 
string Points = row.SelectSingleNode("td[@class='tbl_col2']/a[@id='PointLink']/text()").InnerText; 

而且你真的可以简化您的代码的其余部分:

var UserTable = doc.DocumentNode.SelectNodes("//div[@id='mainDiv']/table[@id='tbl']/tbody/tr"); 
foreach (var row in UserTable) 
{ 
    if (row.Attributes["data-source"] != null) 
    { 
     string Source = row.Attributes["data-source"].Value; 
     string UserName = row.SelectSingleNode("td[@class='tbl_col1']/a[@id='UserLink']/text()").InnerText; 
     string Points = row.SelectSingleNode("td[@class='tbl_col2']/a[@id='PointLink']/text()").InnerText; 
     Console.WriteLine(Source + "\t" + UserName + "\t" + Points); 
    } 
} 
+0

非常感谢!这为我解决了它。 – yqit 2013-03-15 21:50:52