Beautifulsoup内容返回列表索引超出范围

我正在关注这些教程http://importpython.blogspot.com/2009/12/how-to-get-beautifulsoup-to-filter.html和http://importpython.blogspot.com/2009/12/how-to-screen-scrape-craigslist-using.html甚至复制粘贴的代码我似乎无法获得打印链接的标题，因为我得到一个列表索引超出范围分别是第11行和第8行。如果我复制代码，我做错了什么。我试过其他变化，如只返回链接和运作完全正常，所以我不认为这是一个地方问题Beautifulsoup内容返回列表索引超出范围

编辑

的问题是下面的代码（从http://importpython.blogspot.com/2009/12/how-to-screen-scrape-craigslist-using.html）：

from BeautifulSoup import BeautifulSoup #1 
from urllib2 import urlopen    #2 

site = "http://sfbay.craigslist.org/rea/" #3 
html = urlopen(site)      #4 
soup = BeautifulSoup(html)    #5 
postings = soup('p')      #6 

for post in postings:      #7 
    print post('a')[0].contents[0]  #8 
    print post('a')[0]['href']   #9

给出了错误：

Traceback (most recent call last): 
    File "<stdin>", line 2, in <module> 
IndexError: list index out of range

来源

2014-06-10 omriki

请包括一个[最小示例]（http://sscce.org）代码，在您的实际问题中演示问题，而不仅仅是场外链接。 – jonrsharpe

这是依靠Craigslist网站的HTML结构尿，这已经改变了。你会得到你的“正确”的结果在第二个“一”标签：

print post('a')[1].contents[0] 
print post('a')[1]['href']

来源

2014-06-10 08:33:16 cchristelis

BeautifulSoup是非常强大的......所以不要偷懒和使用其所有力量：

soup = BeautifulSoup(html) 
postings = soup.find_all('p', {'class': 'row'}) 

for post in postings: 
    info_container = post.find('span', {'class':'pl'}).find('a') 
    print info_container.text 
    print info_container['href']

我总是尽量避免在我的代码中硬编码数组大小。并使用查找功能，这是最直观的

来源

2014-06-10 16:31:48 Curro

Beautifulsoup内容返回列表索引超出范围

回答

相关问题