2013-01-08 37 views
1

我有HTML字符串像这样(雅虎XML描述元素)HTML Agilty包和字符串解析?

<img src="http://l.yimg.com/a/i/us/we/52/26.gif"/><br /> 
<b>Current Conditions:</b><br /> Cloudy, 1 C<BR /> <BR /> 
<b>Forecast:</b><BR /> Mon - Snow. High: -5 Low: -14<br /> Tue - Light Snow. High: -8 Low: -16<br /> <br /> 
.... 

我想只有高低值(对于上面的例子:-5,-14,-8,-16)

我尝试用htmlAgilityPack得到这样的:

HtmlDocument htmlDoc = new HtmlDocument(); 
htmlDoc.LoadHtml(rssDescriptionElement); 
List<string> elements = new List<string>(); 

foreach (HtmlNode element in htmlDoc.DocumentNode.SelectNodes("//br")) 
{ 
    elements.Add(element.NextSibling.InnerText); 
} 

elements列表输出上面htmlString:

"\n" 
"\nCloudy, 1 C" 
"\n" 
"Forecast:" 
"\nMon - Snow. High: -5 Low: -14" 
"\nTue - Light Snow. High: -8 Low: -16" 
"\n" 
"\n" 
"" 
"\n(provided by " 
"\n" 

如何从此列表中只能得到高值和低值(-5,-14,-8,-16)还是另一种不同的解决方案?

回答

1

使用正则表达式:

(?:High|Low)\s*:\s*(?<num>-?\d+) 

并获得组命名为num。示例代码:

List<string> elements = new List<string>(); 
var pattern = @"(?:High|Low)\s*:\s*(?<num>-?\d+)"; 

foreach (HtmlNode element in htmlDoc.DocumentNode.SelectNodes("//br")) 
{ 
    foreach(Match mc in Regex.Matches(element.NextSibling.InnerText, pattern)) 
    { 
     elements.Add(mc.Groups["num"].ToString()); 
    } 
}