2016-01-23 54 views
2

Scrapy和Selenium玩的很好,如果一切都在相同的功能parse(),但因为我需要在解析函数内添加更多的代码我想将分割的部分拆分功能parse_data()并使用请求()但回调不起作用。Scrapy Selenium无法提出请求()回调

class MySpider(Spider): 
    name = "myspider" 
    start_urls = ["http://example.com/Data.aspx",] 

    def __init__(self, *args, **kwargs): 
     super(MySpider, self).__init__(*args, **kwargs) 
     self.driver = webdriver.Firefox() 
     dispatcher.connect(self.spider_closed, signals.spider_closed) 

    def spider_closed(self, spider): 
     self.driver.close() 

    def parse(self, response): 
     item = MyItem() 
     self.driver.get(response.url) 
     sel = Selector(response) 

     buttons = len(self.driver.find_elements_by_xpath("//input[@class='buttonRowDetails']")) 

     for x in range(buttons): 
      time.sleep(5) 
      button = self.driver.find_elements_by_xpath("//input[@class='buttonRowDetails']")[x] 
      button.click() 
      time.sleep(5) 

      response = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8') 
      print '\n\n\nHELLO FROM PARSE' 

      yield Request(response.url, meta={'item': item}, callback=self.parse_data) 

    def parse_data(self, response): 

     item = response.meta['item'] 
     print '\n\nHELLO FROM PARSE_DATA' 

回答

0

猜测您的请求已过滤,因为它进入相同的URL(默认情况下打开了重复请求过滤中间件)。

使用dont_filter参数来关闭过滤:

yield Request(response.url, 
       meta={'item': item}, 
       callback=self.parse_data, 
       dont_filter=True)