如何使用单个爬网程序对多个域进行爬网？

-1

如何使用单个爬网程序从多个域中抓取数据。我已经使用美丽的汤对单个网站进行了爬网，但无法弄清楚如何创建一个通用的网站。如何使用单个爬网程序对多个域进行爬网？

2017-03-04 Puja

那么这个问题是有缺陷的，你想刮的网站必须有一些共同点。

from bs4 import BeautifulSoup 
from urllib import request 
import urllib.request 

for counter in range(0,10):   
    # site = input("Type the name of your website") Python 3+ 
    site = raw_input("Type the name of your website") 
    # Takes the website you typed and stores it in > site < variable 
    make_request_to_site = request.urlopen(site).read() 
    # Makes a request to the site that we stored in a var 
    soup = BeautifulSoup(make_request_to_site, "html.parser") 
    # We pass it through BeautifulSoup parser in this case html.parser 
    # Next we make a loop to find all links in the site that we stored 
    for link in soup.findAll('a'): 
     print link['href']

来源

2017-03-05 12:19:05