2017-08-07 30 views
1

我插入一个scrapy项目类,我已经在items.py中定义到一个mongodb中,但我需要它插入该类的所有字段,以便它将这些字段添加到数据库中空。上市类下的NamePrice将始终插入为空,但我希望保持pipelines.py清洁,以便我可以轻松切换到其他项目。目前,如果我没有将类的每个部分设置为空字符串,那么在插入到db时不会添加该部分。Initialze类(scrapy项目)与空字符串

我是否需要将每个成员初始化为空字典?像Title = scrapy.Field({})

items.py

class Listing(scrapy.Item): 
    Title = scrapy.Field() 
    Address = scrapy.Field() 
    Price = scrapy.Field() 
    Name = scrapy.Field() 

pipelines.py

def process_item(self, item, spider): 

    # Price and Name will always be empty 
    item['Price'] = '' 
    item['Name'] = '' 
    self.collection.insert_one(dict(item)) 

回答

0

您可以使用scrapy的ItemLoader

from scrapy.loader import ItemLoader 
from scrapy.item import Item, Field 
class Listing(Item): 
    title = Field() 
    address = Field() 
    price = Field() 
    name = Field() 

class MyLoader(ItemLoader): 
    default_item_class = Listing 

然后:

loader = MyLoader(response=response) 
loader.add_xpath('title', '//some/xpath/that/finds/nothing') 
loader.load_item() 
# {'title': ['']}