2016-04-26 75 views
2

我正在使用Scrapy默认的RetryMiddleware尝试重新下载失败的URL。我想要处理这种方式的页面,它在响应时获得了429个状态码(“太多请求”)。Scrapy重试中间件失败,出现非标准http状态代码

但我得到错误

Traceback (most recent call last): 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 46, in process_response 
    response = method(request=request, response=response, spider=spider) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/downloadermiddlewares/retry.py", line 58, in process_response 
    reason = response_status_message(response.status) 
    File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/utils/response.py", line 58, in response_status_message 
    reason = http.RESPONSES.get(int(status)).decode('utf8', errors='replace') 
AttributeError: 'NoneType' object has no attribute 'decode' 

我试图调试问题,同时发现Scrapy RetryMiddleware其实之前重新尝试下载页面尝试定义先前失败的原因。 所以response_status_message方法尝试创建使用状态码和状态文本字符串,例如

>>> response_status_message(404) 
    '404 Not Found' 

为了得到它采用双绞线响应方法http.RESPONSES.get(int(status))响应字符串。但是如果自定义http状态码不使用默认参数get(),它将返回NoneType而不是字符串。

因此,Scrapy试图为NoneType调用decode('utf8', errors='replace')

有没有可能避免这种情况?

回答

3

这实际上是在Scrapy库中的错误。但它已经被固定在this commit并且被放置在RC1.1中。changelogs

+1

没错。这是问题:https://github.com/scrapy/scrapy/pull/1857 –