Beautifulsoup检索href列表

感谢您的关注！我试图在搜索结果中检索产品的href。例如本页：Beautifulsoup检索href列表

但是，当我缩小到产品图像类，retrived href是图像链接.... 任何人都可以解决这个问题吗？提前致谢！

url = 'http://www.homedepot.com/b/Husky/N-5yc1vZrd/Ntk-All/Ntt-chest%2Band%2Bcabinet?Ntx=mode+matchall&NCNI-5' 
content = urllib2.urlopen(url).read() 
content = preprocess_yelp_page(content) 
soup = BeautifulSoup(content) 

content = soup.findAll('div',{'class':'content dynamic'}) 
draft = str(content) 
soup = BeautifulSoup(draft) 
items = soup.findAll('div',{'class':'cell_section1'}) 
draft = str(items) 
soup = BeautifulSoup(draft) 
content = soup.findAll('div',{'class':'product-image'}) 
draft = str(content) 
soup = BeautifulSoup(draft)

来源

2014-12-31 plain vanilla

你并不需要找到每个标签的内容与BeautifulSoup一遍又一遍的加载。

使用CSS selectors得到所有产品的链接（下diva标签与class="product-image"）

import urllib2 
from bs4 import BeautifulSoup 

url = 'http://www.homedepot.com/b/Husky/N-5yc1vZrd/Ntk-All/Ntt-chest%2Band%2Bcabinet?Ntx=mode+matchall&NCNI-5' 
soup = BeautifulSoup(urllib2.urlopen(url)) 

for link in soup.select('div.product-image > a:nth-of-type(1)'): 
    print link.get('href')

打印：

http://www.homedepot.com/p/Husky-41-in-16-Drawer-Tool-Chest-and-Cabinet-Set-HOTC4016B1QES/205080371 
http://www.homedepot.com/p/Husky-26-in-6-Drawer-Chest-and-Cabinet-Combo-Black-C-296BF16/203420937 
http://www.homedepot.com/p/Husky-52-in-18-Drawer-Tool-Chest-and-Cabinet-Set-Black-HOTC5218B1QES/204825971 
http://www.homedepot.com/p/Husky-26-in-4-Drawer-All-Black-Tool-Cabinet-H4TR2R/204648170 
...

div.product-image > a:nth-of-type(1) CSS选择器将直接下div与每一个第一a标签相匹配类product-image。

要保存链接到一个列表，使用列表理解：

links = [link.get('href') for link in soup.select('div.product-image > a:nth-of-type(1)')]

来源

2014-12-31 03:04:24 alecxe

真棒！你能告诉我如何将它们保存到列表中吗？所以我可以实际输出它们。 –

@ plainvanilla好的，更新了答案。 – alecxe

Beautifulsoup检索href列表

回答

相关问题