如何在Python中提取<h1></h1>之间的href链接？

-2

我是新来的Python，我试图学习网络抓取。如何在Python中提取<h1></h1>之间的href链接？

我有下面的代码，并想知道如何获得/打印HREF或链接：

< .h1> < .A HREF =“https://www.nytimes.com/tips “有一个机密的新闻提示？

2017-02-25 Andrew Ong

类似于http://stackoverflow.com/questions/42173719/how-to-use-regular-expression-to-retrieve-data-in-python/42173798#42173798 –

另一个类似的https：// stackoverflow。 COM /问题/ 3075550 /如何-可以-I-GET-HREF链接，从-HTML-使用的Python – Tudor

您可以使用BeautifulSoup得到这个工作做完：

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re 

response = urlopen("http://someurl.com") 
page_source = response.read() 
soup = BeautifulSoup(page_source, 'html.parser') 
x = soup.find_all('h1') 
print (x)

那么所有你需要做的就是从输出使用re模块和提取数据。

来源

2017-02-25 09:27:58

如何在Python中提取<h1></h1>之间的href链接？

回答

相关问题