Scrapy重试中间件失败，出现非标准http状态代码

我正在使用Scrapy默认的RetryMiddleware尝试重新下载失败的URL。我想要处理这种方式的页面，它在响应时获得了429个状态码（“太多请求”）。Scrapy重试中间件失败，出现非标准http状态代码

但我得到错误

Traceback (most recent call last): 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 46, in process_response 
    response = method(request=request, response=response, spider=spider) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/downloadermiddlewares/retry.py", line 58, in process_response 
    reason = response_status_message(response.status) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/utils/response.py", line 58, in response_status_message 
    reason = http.RESPONSES.get(int(status)).decode('utf8', errors='replace') 
AttributeError: 'NoneType' object has no attribute 'decode'

我试图调试问题，同时发现Scrapy RetryMiddleware其实之前重新尝试下载页面尝试定义先前失败的原因。所以response_status_message方法尝试创建使用状态码和状态文本字符串，例如

>>> response_status_message(404) 
    '404 Not Found'

为了得到它采用双绞线响应方法http.RESPONSES.get(int(status))响应字符串。但是如果自定义http状态码不使用默认参数get()，它将返回NoneType而不是字符串。

因此，Scrapy试图为NoneType调用decode('utf8', errors='replace')。

有没有可能避免这种情况？

来源

2016-04-26 s_mart

这实际上是在Scrapy库中的错误。但它已经被固定在this commit并且被放置在RC1.1中。changelogs

来源

2016-04-26 04:01:44

没错。这是问题：https：//github.com/scrapy/scrapy/pull/1857 –

Scrapy重试中间件失败，出现非标准http状态代码

回答

相关问题