1
我在写一个涉及抓取某些固定网站的小应用程序。在这种情况下,我正在抓取TechCrunch,并且因为我得到了一个KeyError
,我真的不应该这样做。BeautifulSoup页面抓取中的KeyError
下面是确实慢行代码的一部分:
response = urllib.request.urlopen(self.url)
soup = BeautifulSoup(response.read(), "html.parser")
chunks = soup.find_all('li', class_='river-block')
html = 'TechCrunch:'
html += '<ul>'
for c in chunks:
print(c.attrs.keys())
print(c.attrs.values())
html += '<li>'
html += c.attrs['data-sharetitle']
html += '<a href="' + c.attrs['data-permalink'] + '">Read more</a>'
html += '</li>'
html += '</ul>'
的想法是,链接和标题分别存储在data-permalink
和data-sharetitle
属性。如今,这两个打印语句的输出是我所期望的:
dict_keys(['class', 'data-sharetitle', 'id', 'data-shortlink', 'data-permalink'])
dict_values([['river-block', 'crunch-network'], 'Investing In Artificial\xa0Intelligence', '1251865', 'http://tcrn.ch/1mEbmcG', 'http://techcrunch.com/2015/12/25/investing-in-artificial-intelligence/'])
然而,行html += c.attrs['data-sharetitle']
给我KeyError: 'data-sharetitle'
。为什么?
谢谢!在粗略的看来,每个'li'都有'river-block'类,我现在注意到我后面还有其他'li'。 – dotslash