解析HTML（获取标签之间的文本/串）

我有这样的：解析HTML（获取标签之间的文本/串）

<div class="ResultItem"> 
<table border="0" cellpadding="0" cellspacing="0" style="top: 0; left: 0; width: 100%;"> 
    <tr> 
     <td class="result"> 
      <a href="http://msdn.microsoft.com/en-us/library/system.windows.uielement.aspx" onclick="trackClick(this, '117', 'http\x3a\x2f\x2fmsdn.microsoft.com\x2fen-us\x2flibrary\x2fsystem.windows.uielement.aspx', '1');"><b>UIElement</b> Class &#40;System.Windows&#41;</a>&nbsp; 
      <div class="ResultDescription"><b>UIElement</b> is a base class for WPF core level implementations building on Windows Presentation Foundation &#40;WPF&#41; elements and basic presentation characteristics.</div> 
      <div class="ResultUrl">msdn.microsoft.com&#47;en-us&#47;library&#47;sy<wbr><a class="wbr"></a>stem.windows.<b>uielement</b>.aspx</div> 
     </td> 
    </tr> 
</table> 
</div>

我想提取从<a>(grab this string)</a>和<div class="ResultDescription">(grab data</div>数据。我将如何做到这一点？

来源

2011-05-05 JJKio

如果你的目标是阅读MSDN网站，他们对于

http://services.msdn.microsoft.com/ContentServices/ContentService.asmx

一个实际的Web服务API，因此屏幕抓取是没有必要的。只需添加对该URL的引用即可。

来源

2011-05-05 00:58:14 vcsjones

最好的选择是长期使用专用的HTML解析库而不是自定义的字符串操作。有一个HtmlAgilityPack的主干版本，HAPPhone可以在Windows Phone 7上运行。你必须从codeplex手动下载它，但它仍然不得不自己编写它。

来源

2011-05-05 02:51:47 BrokenGlass

如果（而且只有！），你的html是一个有效的XHTML，你可以使用任何XML解析器来获得你想要的。

来源

2011-05-05 02:58:27

重申一下BrokenGlass提到，铺天盖地的答案What is the best way to parse html in C#?是使用图书馆一样HtmlAgilityPack，为手机，这将意味着东西像HAPPphone

来源

2011-05-05 02:58:44

如果您解析任务仅仅是小长字符串，然后，您可以分析使用javascript的'html'内容的字符串。下面这行代码将使用正则表达式来替换html标签并提供正常文本。

//Javascript 
var normal_text = html_string.replace(/(<.*?>)/ig,"");

来源

2011-05-13 09:51:19 lucentmind

解析HTML（获取标签之间的文本/串）

回答

相关问题