删除HTML实体及其内容

我必须使用Document doc =jsoup.connect(someUrl).get()和Elements body=doc.select("div.chapter")删除HTML实体及其内容

String myHtml = " 
<div class="chapter"> 
    <h1>Hello this is my example</h1> 
    <p>This is paragraph one</p> 
    <p>This is paragraph two <sup class="num">Nuisance 1</sup><span class="notes">Nuisance 2</span></p> 
    <p>This is paragraph three</p> 
</div>"

我想删除<sup> </sup>和<span> <\span>他们与JSOUP内容中提取HTML片段。我读过使用正则表达式语法是一个坏主意。大多数的例子和答案都解决了这个问题，以去除标签并保留内容。我想获得的是：

String newHtml = " 
<div class="chapter"> 
    <h1>Hello this is my example</h1> 
    <p>This is paragraph one</p> 
    <p>This is paragraph two</p> 
    <p>This is paragraph three</p> 
</div>"

我已经使用JSOUP没有满意的结果（它使SUP和SPAN实体/标签）。

来源

2013-08-06 Rod

'not'去除未在指定的选择返回元素查询。它不会*进入*到每个元素。 –

请给我们一些努力！ – Niranjan

具有后读更多（的方式更多！），并尝试不同的选择，我已经适应了解决我自己的例子：

doc.getElementsByClass("notes").remove(); 
doc.getElementsByClass("num").remove(); 
Elements newElement = doc.select("div.chapter"); 
String newHtml=newElement.toString();

来源

2013-08-07 01:01:38 Rod

也许使用remove后select荷兰国际集团的sup元素：

doc.select("div > sup").remove();

在那里，我已经使用了儿童组合子，它适用于您的具体例子。如果它们在div的子元素内，则必须调整选择器。

来源

2013-08-06 16:30:19

body.select("p > sup.num, p > span.notes").remove(); 
System.out.println(body.html());

应该是完美的你的情况。

来源

2013-08-06 19:52:12 Niranjan

删除HTML实体及其内容

回答

相关问题