2017-09-16 44 views
0

我试图从一个函数传递一个值。scrapy从多个站点获取值

我查了文档,只是不明白。 REF:

def parse_page1(self, response): 
    item = MyItem() 
    item['main_url'] = response.url 
    request = scrapy.Request("http://www.example.com/some_page.html", 
          callback=self.parse_page2) 
    request.meta['item'] = item 
    yield request 

def parse_page2(self, response): 
    item = response.meta['item'] 
    item['other_url'] = response.url 
    yield item 

这里是什么,我想才达到一个psudo代码:

import scrapy 

class GotoSpider(scrapy.Spider): 
    name = 'goto' 
    allowed_domains = ['first.com', 'second.com] 
    start_urls = ['http://first.com/'] 

def parse(self, response): 
    name = response.xpath(...) 
    price = scrapy.Request(second.com, callback = self.parse_check) 
    yield(name, price) 


def parse_check(self, response): 
    price = response.xpath(...) 
    return price 
+0

你想要一个包含来自这两个网站的信息的项目吗?或每个网站一个项目? – eLRuLL

+0

不,我不想要一个包含所有变量的对象,我想要不同的变量。如果这是不可能的,我必须,那么一个对象。 – daniel

回答

0

这是你可以通过任何价值,链接等,以其他方法:

import scrapy 

class GotoSpider(scrapy.Spider): 
    name = 'goto' 
    allowed_domains = ['first.com', 'second.com'] 
    start_urls = ['http://first.com/'] 

    def parse(self, response): 
     name = response.xpath(...) 
     link = response.xpath(...) # link for second.com where you may find the price 
     request = scrapy.Request(url=link, callback = self.parse_check) 
     request.meta['name'] = name 
     yield request 

    def parse_check(self, response): 
     name = response.meta['name'] 
     price = response.xpath(...) 
     yield {"name":name,"price":price} #Assuming that in your "items.py" the fields are declared as name, price 
+0

非常感谢。最后一个很好的简单答案!我正在翻阅其他人的stackoverflow问题,只是没有设法理解它。但现在它清澈透明。谢谢! – daniel

+0

顺便说一句你的解决方案是传递一个值的功能,我将如何去绕另一种方式?而不是发送名称,收到价格。 – daniel