1

spider_closed()函数未执行。如果我只给出打印语句它正在打印,但如果我执行任何函数调用并返回它不工作的值。如何在scrapy中完成所有爬网之后执行该功能?

import scrapy 
import re 
from pydispatch import dispatcher 
from scrapy import signals 

from SouthShore.items import Product 
from SouthShore.internalData import internalApi 
from scrapy.http import Request 

class bestbuycaspider(scrapy.Spider): 
    name = "bestbuy_dca" 

    allowed_domains = ["bestbuy.ca"] 

    start_urls = ["http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+beds", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+night+stand", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+headboard", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+desk", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+bookcase", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+dresser", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+tv+stand", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+armoire", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+kids", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+changing+table", 
       "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+baby"] 

    def __init__(self,jsondetails="",serverdetails="", *args,**kwargs): 
     super(bestbuycaspider, self).__init__(*args, **kwargs) 
     dispatcher.connect(self.spider_closed, signal=signals.spider_closed) 
     self.jsondetails = jsondetails 
     self.serverdetails=serverdetails 
     self.data = [] 

    def parse(self,response): 
     #my stuff here 



    def spider_closed(self,spider): 
     print "returning values" 
     self.results['extractedData']=self.data 
     print self.results=internalApi(self.jsondetails,self.serverdetails) 
     yield self.results 

1)我要调用一些功能和返回值刮掉

+0

所以你想继续在'spider_closed'中爬行?产生物品或请求? – eLRuLL

+0

不,我想在蜘蛛关闭后返回抓取的项目,并在另一个py文件中调用另一个函数,因此它会执行一些操作并给出一些值。我需要附加并返回,我的爬行值和称为函数输出在一起。 –

+0

scrapy项目不存储在内存中,当您调用'yield item'时,它们会被输出。如果你想在输出时处理每个项目,你将不得不使用管道,但是一旦蜘蛛结束就使用它们,这是一个非常糟糕的做法(因为你必须自己存储它们) – eLRuLL

回答

0

您可以创建一个Item Pipelineclose_spider()方法:

class MyPipeline(object): 
    def close_spider(self, spider): 
     do_something_here() 

只是不要忘记激活它设置.py,如上面的文档链接中所述。

+0

道歉,我, m新的scrapy,我是否需要在pipelines.py文件中创建Pipeline类和closs_spider函数,或者我可以在我的蜘蛛文件本身中更改类名称。 –

+0

如果我需要在pipelines.py文件中创建类和函数,那么我的疑惑是1)如何将该管道类导入到我的蜘蛛文件或自动获取? 2)如何将抓取的值传递给pipelines.y文件中的close_spider函数。 –

相关问题