刮Html属性

<tr valign="middle" align="center"> 
<td><b>someNumbers</b></td> 
<td width="22" height="22" background="..." class="SomeIntrestingClass">xxxxx</td> 
<td width="22" height="22" background="..." class="SomeIntrestingClass">xgdsx</td> 
<td width="22" height="22" background="..." class="SomeIntrestingClass">xyzzx</td> 
<td width="22">&nbsp;</td></tr>

我正在做一个需要从网站的数据的应用程序。我需要提取'someNumbers'中的值以及td ex中的值：'xyzzx'...
我遇到的问题是'someNumbers没有类，所以我尝试使用
doc.getElementsByAttributeValue(key, value)
但该文档的其他部分的属性相同。我怎样才能使用JSoup或任何其他明智的想法提取这些值？感谢您的任何建议。刮Html属性

来源

2012-12-22 wtsang02

你可以选择所有的'td'并只获取文本内容吗？ – nhahtdh

我可以选择td标签。但是，这将导致1k结果，我只使用'someNumbers'将很难区分的30％。但生病尝试。 – wtsang02

Document.select(...);这是什么方法做，我们就可以使用“CSS选择器”像td.class或tr td #id，只是使用它们，就好像他们在这个article在Jsoup CSS选择器。

来源

2012-12-22 19:00:10 wtsang02

-1

<td[^<]+?>*</[^<]+?>使用这个作为正则表达式，并将其存储阵列中的全部

然后通过除去<td[^<]+?>，然后将此</[^<]+?>删除每一个。

来源

2012-12-22 18:33:13

-1。 OP已经在使用正确的HTML解析器。 – nhahtdh

请阅读[本]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags） – wtsang02

回答

相关问题