解析HTML与正则表达式

-2

我有兵HTML，我想从它与解析结果：解析HTML与正则表达式

string BingRegex = "<div class=\"sb_tlst\"><h3><a href=\"(.*?)\""; 
    string[] results = Regex.Matches(responseStr, BingRegex).Cast<Match>().Select(m => m.Value).ToArray();

我得到的结果为数组，但它的模式添加到每个结果，是这样的：

<div class=\"sb_tlst\"><h3><a href=\"www.cnn.com\" 
<div class=\"sb_tlst\"><h3><a href=\"www.google.com\" 
<div class=\"sb_tlst\"><h3><a href=\"www.gmail.com\"

任何想法如何解决这个并获得唯一的网址是什么？

来源

2013-12-18 MTA

你不应该使用正则表达式来解析html。 – gleng

见http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags –

好就可以了，但它会很快出问题你。 –

除了与HTML解析器这样做（这是一个好主意），替换：

Select(m => m.Value)

有：

Select(m => m.Value.Groups[1].Value)

虽然你可能会想在一个小的错误处理，以检查组实际上是填充扔。

但最好的解决方案是不使用正则表达式或HTML解析器，而是使用Bing search API，因为这正是它的目的。

来源

2013-12-18 15:22:35

感谢的人。它工作完美！ – MTA

我会建议不要使用正则表达式来解析HTML。建议使用HtmlAgilityPack here。然后只需使用XPath来获取所需的属性值。

中的XPath为您的样品DIV

<div class="sb_tlst"> 
    <h3> 
     <a href="www.gmail.com"/> 
    </h3> 
</div>

将

/div[@class='sb_tlst']/h3/a/@href

来源

2013-12-18 15:17:53

解析HTML与正则表达式

回答

相关问题