使用beatifulsoup在标签之间获取内容

我试图获取<h2>和</h2>之间的所有内容。像这样的：使用beatifulsoup在标签之间获取内容

<h2> Header 1 </h2> 
This is an example text for <a href="https://example.com">site</a> 
Any HTML-Code can appear 
<br /> 
<p> 
<h2> Header 2 </h2> 
Some other text with no tags 
<h2> Header 3 </h2>

的结果应该是：

This is an example text for <a href="https://example.com">site</a> 
Any HTML-Code can appear 
<br> 
<p>

和：

Some other text with no tags

谁能把我推在正确的方向？

来源

2017-03-02 houdini2

你能得到整个文字，然后用分解（）取本H2标签？ – RoundFour

到目前为止你有尝试过什么吗？ – Nobita

鉴于你在下面陈述，你的问题有点不清楚。我不明白你是在寻找仅在两个标签之间的文本，或者你想保留''和'
'标签以及文本 – DMPierre

谢谢你的提示，但不是exaclty什么，我需要。我可以告诉你更少的信息。

有很多的内容之前，该文本之后，我只希望到grep </h2>和<h2>

如果我使用分解（之间的文本），它只是删除H2标签，但所有其他的东西都还在那里。我的问题是类似的一个：Extracting text without tags of HTML with Beautifulsoup Python

我发现了一个可能的解决方案：

content = soup.find_all("div",class_="class") 
begin = str(content).find("Header 1</h2>") 
end = str(content).find("<h2>Header 2") 
print(str(content)[begin:end])

来源

2017-03-02 14:20:40 houdini2

我会去分解。

while soup.find("h2") != None: # the find method returns the found element 
    soup.h2.decompose() 

>>> \nThis is an example text for <a href="https://example.com">site</a>\nAny HTML-Code can appear \n<br>\n<p>\n\nSome other text with no tags\n</p></br>

或者更加微妙：

soup.h2.decompose() 
second_text = soup.h2.next_sibling 
while soup.find("h2") != None: 
    soup.h2.decompose() 

print soup, second_text 


>>> This is an example text for <a href="https://example.com">site</a> 
    Any HTML-Code can appear 
    <br> 
    <p> 

    Some other text with no tags 
    </p></br> 
    Some other text with no tags

来源

2017-03-02 10:48:16 DMPierre

使用beatifulsoup在标签之间获取内容

回答

相关问题