2015-07-12 112 views
1

我想解析HTML,我不知道如何使用条件(例如类名必须是X)。我知道很多关于敏捷包的主题,但我找不到任何有用的东西。HtmlAgilityPack解析属性

<div class="main-class"> 
<a href="LINK"> 
<img src="IMAGELINK" alt="SOMETEXT" class="image-class"> 
</a> 
</div> 

<p> bla bla </p> 

<div class="main-class"> 
<a href="LINK"> 
<img src="IMAGELINK" alt="SOMETEXT" class="image-class"> 
</a> 
</div> 

<div class="main-class"> 
<a href="LINK"> 
<img src="IMAGELINK" alt="SOMETEXT" class="image-class"> 
</a> 
<p> asd sadh awww </p> 
</div> 

我想HREF,SRC和alt为每个类名“主级”的div, 这是我的代码,但它仅打印“P”,因为这是我唯一知道如何做。

 HtmlDocument doc = new HtmlDocument(); 
     doc.LoadHtml(dataString); 
     foreach (HtmlNode nodeItem in doc.DocumentNode.Descendants("p").ToArray()) 
      { 
       Debug.WriteLine(nodeItem.InnerText); 
      } 

我工作的WP应用,在那里“的SelectNodes”不支持

回答

0

通过使用传统的非XPath的方式。

注:检查省略为空的值。

string dataString = "<div class=\"main-class\"><a href=\"LINK\"><img src=\"IMAGELINK\" alt=\"SOMETEXT\" class=\"image-class\"></a></div><p> bla bla </p><div class=\"main-class\"><a href=\"LINK\"><img src=\"IMAGELINK\" alt=\"SOMETEXT\" class=\"image-class\"></a></div><div class=\"main-class\"><a href=\"LINK\"><img src=\"IMAGELINK\" alt=\"SOMETEXT\" class=\"image-class\"></a><p> asd sadh awww </p></div>"; 

var doc = new HtmlDocument(); 
doc.LoadHtml(dataString); 

var elements = doc.DocumentNode.Descendants("div").Where(o => o.GetAttributeValue("class", "") == "main-class"); 
foreach (var nodeItem in elements) 
{ 
    var aTag = nodeItem.Descendants("a").First(); 
    var aTagHrefValue = aTag.Attributes["href"]; 

    var imgTag = nodeItem.Descendants("img").First(); 
    var imgTagSrcValue = imgTag.Attributes["src"]; 
    var imgTagAltValue = imgTag.Attributes["alt"]; 

    Console.WriteLine("a href value: {0}", aTagHrefValue.Value); 
    Console.WriteLine("img src value: {0}", imgTagSrcValue.Value); 
    Console.WriteLine("img alt value: {0}", imgTagAltValue.Value); 
    Console.WriteLine(); 
} 
0

@Orel Eraki - 谢谢。我在3分钟前自己做了,不过我会用你的解决方案,因为它只有一个foreach循环。反正这里是我的解决方案

 foreach (HtmlNode nodeItem in doc.DocumentNode.Descendants("div").Where(p => p.GetAttributeValue("class", "def").Equals("main-class"))) 
     { 
      foreach (HtmlNode nodeAItem in nodeItem.Descendants("a")) 
      { 
       Debug.WriteLine(nodeAItem.GetAttributeValue("href", "def")); 
       foreach (HtmlNode nodeIMAGEitem in nodeAItem.Descendants("img")) 
       { 
        Debug.WriteLine(nodeIMAGEitem.GetAttributeValue("src", "def")); 
        Debug.WriteLine(nodeIMAGEitem.GetAttributeValue("alt", "def")); 
       }      
      } 
      } 
0

您可以使用LINQ为

var attrs = doc.DocumentNode 
       .Descendants("div") 
       .Where(d => d.Attributes != null && 
          d.Attributes.Contains("class") && 
          d.Attributes["class"].Value.Contains("main-class")) 
       .Select(d => new 
       { 
        anchor = d.SelectSingleNode("a"), 
        img = d.SelectSingleNode("a") != null 
               ? d.SelectSingleNode("a").SelectSingleNode("img") 
               : null 
       }) 
       .Select(d => new 
       { 
        href = d.anchor != null 
            ? d.anchor.GetAttributeValue("href", string.Empty) 
            : string.Empty, 
        imgsrc = d.img != null 
            ? d.img.GetAttributeValue("src", string.Empty) 
            : string.Empty, 
        imgalt = d.img != null 
            ? d.img.GetAttributeValue("alt", string.Empty) 
            : string.Empty 
       }) 
       .ToList();