重新尝试在python中用urllib打开url超时

我正在寻找使用Python（> 10k）从大量网页解析数据，并且我发现我写入的函数经常遇到超时每500个循环出错。我试图用try - 除了代码块来解决这个问题，但是我想改进这个函数，所以它会在返回错误之前重新尝试打开url四次或五次。有没有一个优雅的方式来做到这一点？重新尝试在python中用urllib打开url超时

我下面的代码：

def url_open(url): 
    from urllib.request import Request, urlopen 
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) 
    try: 
     s = urlopen(req,timeout=50).read() 
    except urllib.request.HTTPError as e: 
     if e.code == 404: 
      print(str(e)) 
     else: 
      print(str(e)) 
      s=urlopen(req,timeout=50).read() 
      raise 
    return BeautifulSoup(s, "lxml")

来源

2017-01-15 user3725021

可能重复[如何重试urllib2.request失败时？]（http://stackoverflow.com/questions/9446387/how-to-retry-urllib2-request-when-fails） – phss

我已经在过去使用这样一种模式，重试：

def url_open(url): 
    from urllib.request import Request, urlopen 
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) 
    retrycount = 0 
    s = None 
    while s is None: 
     try: 
      s = urlopen(req,timeout=50).read() 
     except urllib.request.HTTPError as e: 
      print(str(e)) 
      if canRetry(e.code): 
       retrycount+=1 
       if retrycount > 5: 
        raise 
       # thread.sleep for a bit 
      else: 
       raise 

    return BeautifulSoup(s, "lxml")

你只需要定义canRetry别的地方。

来源

2017-01-15 08:15:44 GantTheWanderer

重新尝试在python中用urllib打开url超时

回答

相关问题