2
我正在使用Scrapy默认的RetryMiddleware尝试重新下载失败的URL。我想要处理这种方式的页面,它在响应时获得了429个状态码(“太多请求”)。Scrapy重试中间件失败,出现非标准http状态代码
但我得到错误
Traceback (most recent call last):
File "/home/vagrant/parse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 46, in process_response
response = method(request=request, response=response, spider=spider)
File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/downloadermiddlewares/retry.py", line 58, in process_response
reason = response_status_message(response.status)
File "/home/vagrant/parse/local/lib/python2.7/site-packages/scrapy/utils/response.py", line 58, in response_status_message
reason = http.RESPONSES.get(int(status)).decode('utf8', errors='replace')
AttributeError: 'NoneType' object has no attribute 'decode'
我试图调试问题,同时发现Scrapy RetryMiddleware其实之前重新尝试下载页面尝试定义先前失败的原因。 所以response_status_message
方法尝试创建使用状态码和状态文本字符串,例如
>>> response_status_message(404)
'404 Not Found'
为了得到它采用双绞线响应方法http.RESPONSES.get(int(status))
响应字符串。但是如果自定义http状态码不使用默认参数get()
,它将返回NoneType而不是字符串。
因此,Scrapy试图为NoneType调用decode('utf8', errors='replace')
。
有没有可能避免这种情况?
没错。这是问题:https://github.com/scrapy/scrapy/pull/1857 –