Beautifulsoup4不返回页面上的所有链接

我正在使用Python 3.5的Web爬虫。使用请求和Beautifulsoup4。我正在尝试获取论坛第一页上所有主题的链接。并将它们添加到列表中。Beautifulsoup4不返回页面上的所有链接

我有2个问题：

1）不知道如何使用beautifulsoup获得链接，我无法获得在链接本身，只是在div 2）看来，Beautifulsoup将返回只有少数主题，而不是全部。

def getTopics(): 
topics = [] 
url = 'http://forum.jogos.uol.com.br/pc_f_40' 
source_code = requests.get(url) 
plain_text = source_code.text 
soup = BeautifulSoup(plain_text, 'html.parser') 

for link in soup.select('[class="topicos"]'): 
    a = link.find_all('a href') 
    print (a)

getTopics（）

来源

2015-10-28 Legos

首先，它实际上遍历呈现网页上的所有38个主题。

实际的问题在于如何为每个主题提取链接 - link.find_all('a href')将找不到任何东西，因为页面上没有a href元素。将它替换为link.select('a[href]') - 它会发现你所有的a元素具有href属性。

好吧，你甚至可以用一个列表理解解决这个问题：

topics = [a["href"] for a in soup.select('.topicos a[href]')]

来源

2015-10-28 02:03:13 alecxe

Beautifulsoup4不返回页面上的所有链接

回答

相关问题