0
我想从this链接中获取新闻文章。我的代码是:提取文本<p></p>与BeautifulSoup
def get_news_details(news_url):
source = requests.get(news_url)
plain_text = source.text
soup = BeautifulSoup(plain_text, "html.parser")
content = soup.findAll('div', {'class' : 'big-img-box'})
print(content[0].findAll('p'))
结果表明:
[<p></p>, <p></p>, <p></p>, <p></p>, <p></p>, <p></p>]
和content
值:
<div class="big-img-box">
<div class="left-imgs">
<figure>
<img alt="iOS developer hints possibility of 4K Apple TV" class="img-responsive" src="http://www.aninews.in/contentimages/detail/appletv.jpg"/>
<figcaption><span class="heading-inner-span"></span></figcaption>
</figure>
<div class="mb10"></div>
</div>
<p></p> New York [USA], August 6 <a class="highlights" href="http://aninews.in/" target="_blank">(ANI)</a>: The latest designs from Apple's HomePod firmware revealed that the tech giant is hinting the launch of a <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/4k-apple-tv.html"> 4K Apple TV</a></span> with high dynamic range (HDR) support for both <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/hdr10.html"> HDR10 </a></span> and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/dolby-vision.html"> Dolby Vision</a></span>.<p></p> While the current range of Apple's TV set-top box is incompatible to 4K technology, <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/ios.html">iOS</a></span> developer <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/guilherme-rambo.html"> Guilherme Rambo</a></span> revealed that the company is hinting an adoption of the ultra high-definition format, reports <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/the-verge.html">The Verge</a></span>.<p></p> Reports of the new range of Apple TV have surfaced time and again over the past few months, starting February this year.<p></p> It is said that implementing the HDR and 4K content will prove to b beneficial for the company, rather than a simpler resolution, since popular online movie and television platforms like <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/netflix.html"> Netflix</a></span> and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/amazon.html"> Amazon</a></span> support the two high-definition formats.<p></p> Last month, iTunes started listing movies as supporting 4K and <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/hdr.html"> HDR</a></span> in users' purchase histories, thus providing more thrust to the speculations of the 4K <span class="highlights"><a href="http://aninews.in/keysearch/keyword-search/apple.html"> Apple</a></span> TV. <a class="highlights" href="http://aninews.in/" target="_blank">(ANI)</a><p></p>
</div>
我可以content[0].text
但我得到的文章的有些笨拙版本无法格式化它。
在检查铬的网页时,文章似乎写在<p>article_text</p>
标签里面。而在content
中,它显示为<p></p>article_text
标签。如果前版本出现在soup
,我可以得到我想要的输出。应该做什么 ?
这适用于我(我的意思是“整理”,谢谢澄清)。但我想知道为什么Chrome的页面检查('
文本
')和BeautifulSoup的版本('文本')有什么区别? – Aroonalok我不确定。但是,我会说,当浏览器软件或BeautifulSoup遇到一个未经过编码以符合其标准的页面时,它必须对该代码执行某些操作才能显示它。 Chrome的设计师在遇到问题时可能朝着一个方向发展,而BeautifulSoup的另一个方向。这种情况下的结果有点不同。 –
@BillBell嘿比尔我只是想向你展示对这个StackOverflow标签的良好支持以及对社区的支持,感谢你,你是一个很好的人。祝你一切顺利,我只是想让你知道我们如何感谢你的帮助。 –