如果没有找到文件，Python请求下载HTML

我正在下载远程文件的列表。我的代码如下所示：如果没有找到文件，Python请求下载HTML

try: 
    r = requests.get(url, stream=True, verify=False) 
    total_length = int(r.headers['Content-Length']) 

    if total_length: 
     with open(file_name, 'wb') as f: 
      for chunk in r.iter_content(chunk_size=1024): 
       if chunk: 
        f.write(chunk) 
        f.flush() 

except (requests.RequestException, StandardError): 
    pass

我的问题是，要求下载该文件不存在纯HTML（例如404页，或在自然界中的HTML页面其他类似）。有没有办法绕过这个？任何头可能检查像Content-Type？

解决方案：

我用了r.raise_for_status()函数调用按照接受的答案，也增加了额外的检查Content-Type像：

if r.headers['Content-Type'].split('/')[0] == "text": 
    #pass/raise here

（MIME类型列表在这里：http://www.freeformatter.com/mime-types-list.html）

来源

2014-02-18 Ion

使用r.raise_for_status()为4xx和5xx状态码的响应引发异常，或者测试r.status_code ex plicitly。

r.raise_for_status()引发HTTPError例外，这是RequestException子类，你已经赶上：

try: 
    r = requests.get(url, stream=True, verify=False) 
    r.raise_for_status() # raises if not a 2xx or 3xx response 
    total_length = int(r.headers['Content-Length']) 

    if total_length: 
     # etc.  
except (requests.RequestException, StandardError): 
    pass

的r.status_code检查会让你缩小你认为正确的响应代码。请注意，3xx重定向是自动处理的，并且您不会看到其他3xx响应，因为requests在这种情况下不会发送条件请求，所以这里几乎不需要显式测试。但是，如果你这样做，它会看起来像：

r = requests.get(url, stream=True, verify=False) 
r.raise_for_status() # raises if not a 2xx or 3xx response 
total_length = int(r.headers['Content-Length']) 

if 200 <= r.status_code < 300 and total_length: 
    # etc.

来源

2014-02-18 13:07:57

谢谢！我还为内容类型添加了额外的检查（如果不是text/*）。 – Ion

if r.status_code == 404: 
    handle404() 
else: 
    download()

来源

2014-02-18 13:08:10

如果没有找到文件，Python请求下载HTML

回答

相关问题