2017-04-18 40 views
1

我目前使用Scrapy使用下面的命令行参数的脚本运行Scrapy:从文件输出

scrapy crawl my_spider -o data.json 

不过,我宁愿在Python脚本“保存”命令。继https://doc.scrapy.org/en/latest/topics/practices.html,我有以下脚本:

import scrapy 
from scrapy.crawler import CrawlerProcess 

from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider 

process = CrawlerProcess({ 
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' 
}) 

process.crawl(ApkmirrorSitemapSpider) 
process.start() # the script will block here until the crawling is finished 

但是,目前还不清楚我从文档的-o data.json命令行参数等效应在脚本中的内容。我怎样才能让脚本生成一个JSON文件?

+1

[scrapy从JSON脚本的输出(可能的重复http://stackoverflow.com/questions/23574636/scrapy-from- script-output-in-json) – Casper

+1

做这个[answer](http://stackoverflow.com/questions/23574636/scrapy-from-script-output-in-json) –

回答

6

您需要的FEED_FORMATFEED_URI添加到您的CrawlerProcess

process = CrawlerProcess({ 
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)', 
'FEED_FORMAT': 'json', 
'FEED_URI': 'data.json' 
})