动态设置scrapy请求回电

我正在使用scrapy。我想旋转代理每个请求的基础上，并从api得到一个代理，我已经返回一个代理。我的计划是要对API的请求，得到一个代理，然后用它的基础上设置代理：动态设置scrapy请求回电

http://stackoverflow.com/questions/39430454/making-request-to-api-from-within-scrapy-function

我有以下几点：

class ContactSpider(Spider): 
    name = "contact" 

def parse(self, response): 

.... 
     PR = Request(
     'my_api' 
     headers=self.headers, 
     meta={'newrequest': Request(url_to_scrape, headers=self.headers),}, 
     callback=self.parse_PR 
    ) 
    yield PR 


def parse_PR(self, response): 
    newrequest = response.meta['newrequest'] 
    proxy_data = response.body 
    newrequest.meta['proxy'] = 'http://'+proxy_data 
    newrequest.replace(url = 'http://ipinfo.io/ip') #TESTING 
    newrequest.replace(callback= self.form_output) #TESTING 

    yield newrequest 

def form_output(self, response): 
    open_in_browser(response)

，但我发现：

Traceback (most recent call last): 
    File "C:\twisted\internet\defer.py", line 1126, in _inlineCallbacks 
    result = result.throwExceptionIntoGenerator(g) 
    File "C:\twisted\python\failure.py", line 389, in throwExceptionIntoGenerator 
    return g.throw(self.type, self.value, self.tb) 
    File "C:\scrapy\core\downloader\middleware.py", line 43, in process_request 
    defer.returnValue((yield download_func(request=request,spider=spider))) 
    File "C:\scrapy\utils\defer.py", line 45, in mustbe_deferred 
    result = f(*args, **kw) 
    File "C:\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request 
    return handler.download_request(request, spider) 
    File "C:\scrapy\core\downloader\handlers\http11.py", line 60, in download_request 
    return agent.download_request(request) 
    File "C:\scrapy\core\downloader\handlers\http11.py", line 255, in download_request 
    agent = self._get_agent(request, timeout) 
    File "C:\scrapy\core\downloader\handlers\http11.py", line 235, in _get_agent 
    _, _, proxyHost, proxyPort, proxyParams = _parse(proxy) 
    File "C:\scrapy\core\downloader\webclient.py", line 37, in _parse 
    return _parsed_url_args(parsed) 
    File "C:\scrapy\core\downloader\webclient.py", line 20, in _parsed_url_args 
    host = b(parsed.hostname) 
    File "C:\scrapy\core\downloader\webclient.py", line 17, in <lambda> 
    b = lambda s: to_bytes(s, encoding='ascii') 
    File "C:\scrapy\utils\python.py", line 117, in to_bytes 
    'object, got %s' % type(text).__name__) 
TypeError: to_bytes must receive a unicode, str or bytes object, got NoneType

我做错了什么？

来源

2016-09-19 user61629

你能贴得更堆栈跟踪的？ – iScrE4m

我已将它添加到上面。 – user61629

Stacktrace信息建议Scrapy遇到请求对象，其url为None，预期它是字符串类型。

这两行代码中的：预期

newrequest.replace(url = 'http://ipinfo.io/ip') #TESTING 
newrequest.replace(callback= self.form_output) #TESTING

是行不通的，因为方法Request.replace返回一个新的实例而不是通过修改就地原始请求。

您需要像这样：

newrequest = newrequest.replace(url = 'http://ipinfo.io/ip') #TESTING 
newrequest = newrequest.replace(callback= self.form_output) #TESTING

或者干脆：

newrequest = newrequest.replace(
    url='http://ipinfo.io/ip', 
    callback=self.form_output 
)

来源

2016-09-19 22:37:48 starrify

动态设置scrapy请求回电

回答

相关问题