Scrapy从Python运行

我想从Python运行Scrapy。我在看这个代码（source）：Scrapy从Python运行

from twisted.internet import reactor 
from scrapy.crawler import Crawler 
from scrapy.settings import Settings 
from scrapy import log 
from testspiders.spiders.followall import FollowAllSpider 

spider = FollowAllSpider(domain='scrapinghub.com') 
crawler = Crawler(Settings()) 
crawler.configure() 
crawler.crawl(spider) 
crawler.start() 
log.start() 
reactor.run() # the script will block here

我的问题是，我如何调整这个代码来运行自己的蜘蛛困惑。我已经打电话给我的蜘蛛项目“spider_a”，它指定了要在蜘蛛本身内爬行的域。

我所问的是，如果我跑我的蜘蛛用下面的代码：

scrapy crawl spider_a

如何调整上面的例子Python代码做？

来源

2013-08-07 Jimmy

只需导入，并传递给crawler.crawl()，如：

from testspiders.spiders.spider_a import MySpider 

spider = MySpider() 
crawler.crawl(spider)

来源

2013-08-07 09:58:57 alecxe

以此方式运行将忽略用户的设置。 – Medeiros

在Scrapy 0.19.x（可以与旧版本的工作），你可以做到以下几点。

spider = FollowAllSpider(domain='scrapinghub.com') 
settings = get_project_settings() 
crawler = Crawler(settings) 
crawler.signals.connect(reactor.stop, signal=signals.spider_closed) 
crawler.configure() 
crawler.crawl(spider) 
crawler.start() 
log.start() 
reactor.run() # the script will block here

你甚至可以直接从脚本像调用命令：

from scrapy import cmdline 
cmdline.execute("scrapy crawl followall".split()) #followall is the spider's name

拿上我的回答here看看。我changed官方documentation所以现在你的爬虫使用你的设置，并可以产生输出。

来源

2013-09-27 22:49:35 Medeiros

Scrapy从Python运行

回答

相关问题