从文件输出

我目前使用Scrapy使用下面的命令行参数的脚本运行Scrapy：从文件输出

scrapy crawl my_spider -o data.json

不过，我宁愿在Python脚本“保存”命令。继https://doc.scrapy.org/en/latest/topics/practices.html，我有以下脚本：

import scrapy 
from scrapy.crawler import CrawlerProcess 

from apkmirror_scraper.spiders.sitemap_spider import ApkmirrorSitemapSpider 

process = CrawlerProcess({ 
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' 
}) 

process.crawl(ApkmirrorSitemapSpider) 
process.start() # the script will block here until the crawling is finished

但是，目前还不清楚我从文档的-o data.json命令行参数等效应在脚本中的内容。我怎样才能让脚本生成一个JSON文件？

来源

2017-04-18 Kurt Peek

[scrapy从JSON脚本的输出（可能的重复http://stackoverflow.com/questions/23574636/scrapy-from- script-output-in-json） – Casper

做这个[answer]（http://stackoverflow.com/questions/23574636/scrapy-from-script-output-in-json） –

您需要的FEED_FORMAT和FEED_URI添加到您的CrawlerProcess：

process = CrawlerProcess({ 
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)', 
'FEED_FORMAT': 'json', 
'FEED_URI': 'data.json' 
})

来源

2017-04-18 09:48:36 vold

回答

相关问题