2015-05-25 253 views
1

如果我有一个CSV文件,每行有一个字典值(列为[“位置”],[“MovieDate”],[“Formatted_Address”],[“Lat”], “Lng”]),如果我想按Location进行分组,并且在共享相同Location值的所有MovieDate值上进行分组,则需要使用OrderDict。python排序字​​典问题

前的数据:

Location,MovieDate,Formatted_Address,Lat,Lng 
    "Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672 
    "Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672 

对于具有相同的位置(^如本例)中的每一行,我想做出这样的输出,以便有没有重复的位置。

"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672 

我的代码使用ordereddict来做这件事有什么问题吗?

from collections import OrderedDict 

od = OrderedDict() 
import csv 
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out: 
    r = csv.reader(f) 
    wr = csv.writer(out) 
    header = next(r) 
    for row in r: 
     loc,rest = row[0], row[1] 
     od.setdefault(loc, []).append(rest) 
    wr.writerow(header) 
    for loc,vals in od.items(): 
     wr.writerow([loc]+vals) 

我最终得到的是这样的:

['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own'] 
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"] 
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '] 
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2'] 

的问题是,我没有得到其他列在这种情况下展现出来,我会怎么做才好?我也宁愿让MovieDate值只是一个长字符串,如下:的 'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers ' 代替:

'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ' 

感谢球员,欣赏它。我是一个python noob。

更改row[0], row[1]row[0], row[1:]遗憾的是不给我我想要的。我只希望被添加在第二列(MovieDate)的值,而不是复制所有其他列,例如:

['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']] 
+0

具体什么不顺心的整个休息吗?你输错了吗?你有错误信息吗?我们需要更多细节。 – user2357112

+0

hey @ user2357112,我更新了它 - 对不完整的问题抱歉。 – SpicyClubSauce

+0

“休息”应该是整个行的其余部分?因为'row [1]'就是第二列中的东西。 – user2357112

回答

1

你只需要一对夫妇的变化,你需要加入lat和长,去除重复数据删除纬度和渴望,我们还需要使用的关键:

with open("data.csv") as f,open("new.csv" ,"w") as out: 
    r = csv.reader(f) 
    wr= csv.writer(out) 
    header = next(r) 
    for row in r: 
     od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2])) 
    wr.writerow(header) 
    for loc,vals in od.items(): 
     wr.writerow([loc[0]] + vals+list(loc[1:])) 

输出:

Location,MovieDate,Formatted_Address,Lat,Lng 
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672 

A League of Their Own首先是因为它是疯狂的,疯狂前行, row[1:-2]得到的一切吧拉特,L ong和location,我们将纬度和长度存储在我们的关键元组中,以避免在每行末尾重复写入。

使用的名称和拆包可能更容易一点遵循:

with open("data.csv") as f, open("new.csv", "w") as out: 
    r = csv.reader(f) 
    wr = csv.writer(out) 
    header = next(r) 
    for row in r: 
     loc, mov, form, lat, long = row 
     od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form)) 
    wr.writerow(header) 
    for loc, vals in od.items(): 
     wr.writerow([loc[0]] + vals + list(loc[1:])) 

使用CSV。Dictwriter保持五列:

od = OrderedDict() 
import csv 

with open("data.csv") as f, open("new.csv", "w") as out: 
    r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng']) 
    wr = csv.DictWriter(out, fieldnames=r.fieldnames) 
    for row in r: 
     od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"], 
             MovieDate=[], Formatted_Address=row["Formatted_Address"])) 

     od[row["Location"]]["MovieDate"].append(row["MovieDate"]) 
    for loc, vals in od.items(): 
     od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"]) 
     wr.writerow(vals) 

# 输出:

"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672 

所以五柱保持完好,我们加入了"MovieDate"成单串和Formatted_Address=form始终是唯一的,所以我们并不需要更新。

事实证明,我们需要做的只是连接MovieDate's,并删除位置,Lat,Lng和'Formatted_Address'的重复条目。

0

让我们尝试改变

od.setdefault(loc, []).append(rest) 

od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)]) 

再拥有这个是:

wr.writerow([loc]+vals) 
+0

与此,但我也得到其他列复制以及: ['杰斐逊纪念公园,芝加哥',['6月12日怪物大学','杰斐逊纪念公园,4822北长街,芝加哥,伊利诺伊州60630 ,'41.76083920000001','-87.6294353'],['Jul-11Frozen',''Jefferson Memorial Park,4822 North Long Avenue,Chicago,IL 60630,USA'','41.76083920000001','-87.6294353'], ['8月8日布鲁斯兄弟','杰斐逊纪念公园,4822 North Long Avenue,Chicago,IL 60630,美国','41.76083920000001','-87.6294353']] – SpicyClubSauce

+0

我已经用我的想法更新了我的答案“问。让我知道这是如何解决的。谢谢! – Misandrist

+0

嘿@Misandrist不幸的是没有好的。回到这里:'TypeError:sequence item 0:expected string,list found' – SpicyClubSauce

-1

假设位置是该行的第一个项目:

dict = {} 
for line in f: 
    if line[0] not in dict: 
     dict[line[0]] = [] 
    dict[line[0]].append(line[1:]) 

而对于每一个位置,你有行

for key, value in dict.iteritems(): 
    out.write(key + value)