使用python beautifulsoup进行网页爬虫

-1

如何提取<p>段落标记中的数据和<li>哪些属于名为<div>的类？使用python beautifulsoup进行网页爬虫

2016-03-04 pKa

交一个样本输入端。 –

post example html/xml –

使用功能find()和find_all()：

import requests 
from bs4 import BeautifulSoup 

url = '...' 

r = requests.get(url) 
data = r.text 
soup = BeautifulSoup(data, 'html.parser') 

div = soup.find('div', {'class':'class-name'}) 
ps = div.find_all('p') 
lis = div.find_all('li') 

# print the content of all <p> tags 
for p in ps: 
    print(p.text) 

# print the content of all <li> tags 
for li in lis: 
    print(li.text)

来源

2016-03-04 08:50:38

真棒..感谢一吨:-) – pKa

使用python beautifulsoup进行网页爬虫

回答

相关问题