2017-01-16 26 views
0

我从JSON响应中爬取数据。使用for循环和所有我将数据提取到项目中,是重写此循环所做的所有以前记录的最后一条记录。Scrapy - 使用for循环附加项目时获取重复项目

这里是我的代码:

def parse_centers_and_ambulances(self, response): 
    json_response = json.loads(response.body_as_unicode()) 
    facility = MedFacilityItem() 
    facility["name"] = "Med Facility #1" 
    centers = [] 
    med_centers = MedCenterItem() 
    for center in json_response: 
     if center["name"].startswith("Center"): 
     med_centers["response_url"] = center["product_id"] 
     med_centers["name"] = center["name"] 
     med_centers["address"] = center["name_short"] + "." +  
               center["adr_name"] + " " + 
               center["adr_dom"] 
     med_centers["lat"] = center["latitude"] 
     med_centers["lon"] = center["longitude"] 
     med_centers["phoneInfo"] = [{"number": center["tel1"], 
            "description": center["tel1_descr"]}, 
            {"number": center["tel2"], 
            "description": center["tel2_descr"]}] 
     centers.append(med_centers) 

    facility["facility_type"] = centers 
    return facility 

什么,我缺少什么?

回答

1

由于Scrapy项目基本上像dicts一样行事,我将在下面的例子中使用dicts。试想一下:

In [1]: dict_list = [] 
    ...: d = {} 
    ...: for i in range(3): 
    ...:  d['i'] = i 
    ...:  dict_list.append(d) 
    ...: print dict_list 
    ...: print [id(e) for e in dict_list] 
    ...: 
[{'i': 2}, {'i': 2}, {'i': 2}] 
[4557722520, 4557722520, 4557722520] 

日文N3 N4 N5是可变的对象,在这种情况下,你是在同字典例如多次追加到列表中。结果列表不包含不同的项目,只有几个对同一个dict对象的引用。下面的例子显示了相同的行为,三次追加相同的字典到列表,然后设定一个值:

In [2]: dict_list = [] 
    ...: d = {} 
    ...: for i in range(3): 
    ...:  dict_list.append(d) 
    ...: d['some'] = 'value' 
    ...: print dict_list 
    ...: 
[{'some': 'value'}, {'some': 'value'}, {'some': 'value'}] 

什么,你需要做的就是通过初始化它们的创建不同类型的字典for循环,如下所示:

In [3]: dict_list = [] 
    ...: for i in range(3): 
    ...:  d = {} 
    ...:  d['i'] = i 
    ...:  dict_list.append(d) 
    ...: print dict_list 
    ...: print [id(e) for e in dict_list] 
    ...: 
[{'i': 0}, {'i': 1}, {'i': 2}] 
[4557901904, 4557724760, 4557843264] 
1

您可以尝试在循环内部定义项目,而不是在其外部。

def parse_centers_and_ambulances(self, response): 
    json_response = json.loads(response.body_as_unicode()) 
    facility = MedFacilityItem() 
    facility["name"] = "Med Facility #1" 
    centers = [] 
    # med_centers = MedCenterItem() <-- this 
    for center in json_response: 
     if center["name"].startswith("Center"): 
     med_centers = MedCenterItem() <-- should be here 
     med_centers["response_url"] = center["product_id"] 
     med_centers["name"] = center["name"] 
     med_centers["address"] = center["name_short"] + "." +  
               center["adr_name"] + " " + 
               center["adr_dom"] 
     med_centers["lat"] = center["latitude"] 
     med_centers["lon"] = center["longitude"] 
     med_centers["phoneInfo"] = [{"number": center["tel1"], 
            "description": center["tel1_descr"]}, 
            {"number": center["tel2"], 
            "description": center["tel2_descr"]}] 
     centers.append(med_centers) 

    facility["facility_type"] = centers 
    return facility