Python BeautifulSoup：有没有一种方法来计算爬网结果的数量？

有没有一种方法可以统计BeautifulSoup中爬行的结果数量？Python BeautifulSoup：有没有一种方法来计算爬网结果的数量？

这是代码。

def crawl_first_url(max_page): 
    page = 1 

    while page <= max_page: 
     url = 'http://www.hdwallpapers.in/page/' + str(page) 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text, 'html.parser') 

     for div in soup.select('.thumb a'): 
      href = 'http://www.hdwallpapers.in' + div.get('href') 
      crawl_second_url(href) 
     page += 1 

def crawl_second_url(second_href): 
    #need to count the number of results here. 
    #I tried, len(second_href) but it doesn't work well. 

crawl_first_url(1)

我想第二个函数来计算抓取结果的数量，例如，如果19个URL已经被抓取，我想要它的数量。

来源

2015-12-22 Lindow

'crawl_second_url'做什么？它只计算结果吗？ – dstudeba

@dstudeba是的，它应该只计算结果的数量，但我不知道我该怎么做... – Lindow

由于您只需要计算结果数量，因此我没有看到有独立功能的原因，只需添加一个计数器即可。

page = 1 
numResults = 0 

while page <= max_page: 
    url = 'http://www.hdwallpapers.in/page/' + str(page) 
    source_code = requests.get(url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, 'html.parser') 

    for div in soup.select('.thumb a'): 
     href = 'http://www.hdwallpapers.in' + div.get('href') 
     numResults += 1 
    page += 1 

print("There are " + numResults + " results.")

这只会计算子页数。如果您还想计算顶层页面，只需在汤后添加另一个增量线。您可能还需要添加一个try: except:块以避免崩溃。

来源

2015-12-22 16:54:33 dstudeba

Python BeautifulSoup：有没有一种方法来计算爬网结果的数量？

回答

相关问题