从我了解的源代码和文档后,-t
option refers to the FEED_FORMAT
setting不能有多个值。此外,FeedExporter
内置分机(source)仅适用于单一出口商。
其实,想想功能要求在Scrapy Issue Tracker。
随着越来越像一个解决方法,定义管道,并开始有多个出口出口。例如,这里是如何导出为CSV和JSON格式:
from collections import defaultdict
from scrapy import signals
from scrapy.exporters import JsonItemExporter, CsvItemExporter
class MyExportPipeline(object):
def __init__(self):
self.files = defaultdict(list)
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
csv_file = open('%s_products.csv' % spider.name, 'w+b')
json_file = open('%s_products.json' % spider.name, 'w+b')
self.files[spider].append(csv_file)
self.files[spider].append(json_file)
self.exporters = [
JsonItemExporter(json_file),
CsvItemExporter(csv_file)
]
for exporter in self.exporters:
exporter.start_exporting()
def spider_closed(self, spider):
for exporter in self.exporters:
exporter.finish_exporting()
files = self.files.pop(spider)
for file in files:
file.close()
def process_item(self, item, spider):
for exporter in self.exporters:
exporter.export_item(item)
return item
作为alecxe的建议,我张贴在scrapy的github上https://github.com/scrapy/scrapy/issues/1336 – kiril