2013-07-30 31 views
0

我在制作基于列表中多个匹配项的字典时遇到了一些麻烦。制作匹配值的字典

这里是一个示例清单:

items = [["1.pdf", "123", "train", "plaza"], 
     ["2.pdf","123", "plane", "town"], 
     ["3.pdf", "456", "train", "plaza"], 
     ["4.pdf", "123", "plane", "city"], 
     ["5.pdf", "123", "train", "plaza"], 
     ["6.pdf","123", "plane", "town"]] 

什么,我试图做的是比赛在每个列表中的最后三个项目,并作出解释。

所以根据上面的列表我会假设所需的输出是。

{1 : [["1.pdf", "123", "train", "plaza"], 
     ["5.pdf", "123", "train", "plaza"]], 
2 : [["2.pdf","123", "plane", "town"], 
     ["6.pdf","123", "plane", "town"]] 
3 : [["3.pdf", "456", "train", "plaza"]] 
4 : [["4.pdf", "123", "plane", "city"]]} 
+1

你有你最初的尝试可以告诉我们吗? – sihrc

+0

为什么不列出清单?毕竟,你使用的是序列号。 –

+0

从哪些数据构建字典密钥? – Howard

回答

7

我可能会建议不同的输出数据格式吗?

from collections import * 
d = defaultdict(list) 

for item in items: 
    d[tuple(item[1:])].append(item[0]) 

这导致像字典:

{ 
    ('123', 'train', 'plaza'): ['1.pdf', '5.pdf'], 
    ('123', 'plane', 'town'): ['2.pdf', '6.pdf'], 
    ('123', 'plane', 'city'): ['4.pdf'], 
    ('456', 'train', 'plaza'): ['3.pdf'] 
} 
+0

我喜欢这种方式,并会工作,但如果索引是按顺序的,我该如何去做呢?示例项目[0],项目[3],项目[4] – thedemon

+0

对不起,如果该问题没有在顶级问题中说明。 – thedemon

1

忽略我不好的命名方案。

items = [["1.pdf", "123", "train", "plaza"], 
     ["2.pdf","123", "plane", "town"], 
     ["3.pdf", "456", "train", "plaza"], 
     ["4.pdf", "123", "plane", "city"], 
     ["5.pdf", "123", "train", "plaza"], 
     ["6.pdf","123", "plane", "town"]] 

final = dict() 
for item in items: 
    final[tuple(item[1:])] = final.get(tuple(item[1:]),[]) + [item] 

new = dict() 
for i in range(len(final)): 
    new[i+1] = final.items()[i][1] 

for key,items in new.items(): 
    print key, ":\n",items 

收益率(不分先后):

{1 : [["1.pdf", "123", "train", "plaza"], 
     ["5.pdf", "123", "train", "plaza"]], 
2 : [["2.pdf","123", "plane", "town"], 
     ["6.pdf","123", "plane", "town"]] 
3 : [["3.pdf", "456", "train", "plaza"]] 
4 : [["4.pdf", "123", "plane", "city"]]} 
+1

的元组可以是使用一个键而不是'str(item [1:])'而'enumerate'比'range(len(final))'更好。 –

+0

请注意,使用正常的字典输出的顺序可以是任意的,所以它不会匹配OP的预期输出。 –

1

您可以使用collections.defaultdict

>>> from collections import defaultdict 
>>> dic = defaultdict(list) 
for item in items: 
    dic[tuple(item[1:])].append(item) 
...  
>>> ans = { i: item for i, item in enumerate(dic.values(), 1)} 
>>> pprint(ans) 
{1: [['1.pdf', '123', 'train', 'plaza'], ['5.pdf', '123', 'train', 'plaza']], 
2: [['2.pdf', '123', 'plane', 'town'], ['6.pdf', '123', 'plane', 'town']], 
3: [['4.pdf', '123', 'plane', 'city']], 
4: [['3.pdf', '456', 'train', 'plaza']]} 

如果为了事项然后使用collections.OrderedDict

>>> from collections import OrderedDict 
>>> dic = OrderedDict() 
for item in items:           
    dic.setdefault(tuple(item[1:]), []).append(item) 
...  
>>> ans = { i: item for i, item in enumerate(dic.values(), 1)} 
>>> pprint(ans) 
{1: [['1.pdf', '123', 'train', 'plaza'], ['5.pdf', '123', 'train', 'plaza']], 
2: [['2.pdf', '123', 'plane', 'town'], ['6.pdf', '123', 'plane', 'town']], 
3: [['3.pdf', '456', 'train', 'plaza']], 
4: [['4.pdf', '123', 'plane', 'city']]} 
1

你在找什么为我操作。如果您正在使用pandas

In [2]: items 
Out[2]: 
[['1.pdf', '123', 'train', 'plaza'], 
['2.pdf', '123', 'plane', 'town'], 
['3.pdf', '456', 'train', 'plaza'], 
['4.pdf', '123', 'plane', 'city'], 
['5.pdf', '123', 'train', 'plaza'], 
['6.pdf', '123', 'plane', 'town']] 

In [3]: df = pd.DataFrame.from_records(items) 

In [4]: df 
Out[4]: 
     0 1  2  3 
0 1.pdf 123 train plaza 
1 2.pdf 123 plane town 
2 3.pdf 456 train plaza 
3 4.pdf 123 plane city 
4 5.pdf 123 train plaza 
5 6.pdf 123 plane town 


In [5]: for n, g in df.groupby([1, 2, 3]): 
    print "name", n 
    print g 
    ....:  
name ('123', 'plane', 'city') 
     0 1  2  3 
3 4.pdf 123 plane city 
name ('123', 'plane', 'town') 
     0 1  2  3 
1 2.pdf 123 plane town 
5 6.pdf 123 plane town 
name ('123', 'train', 'plaza') 
     0 1  2  3 
0 1.pdf 123 train plaza 
4 5.pdf 123 train plaza 
name ('456', 'train', 'plaza') 
     0 1  2  3 
2 3.pdf 456 train plaza