迭代字典中的多个值？

我有一个单词列表和字典：迭代字典中的多个值？

word_list = ["it's","they're","there's","he's"]

并作为在words_list的话如何频繁地出现在几个文件包含信息的字典：

dict = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}), 
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}), 
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})]

我想开发一个数据结构（数据帧，也许？），看起来像如下：

file  word  count 
document1 it's  0 
document1 they're  2 
document1 there's  5 
document1 he's  1 
document2 it's  4 
document2 they're  2 
document2 there's  3 
document2 he's  0 
document3 it's  7 
document3 they're  0 
document3 there's  4 
document3 he's  1

我试图找到这些文档中最常使用的是。我有900多个文件。

我在考虑类似如下：

res = {} 
for i in words_list: 
    count = 0 
    for j in dict.items(): 
     if i == j: 
       count = count + 1 
       res[i,j] = count

我在哪里可以从这里走？

来源

2015-11-04 blacksite

这不是一个字典死心塌地的线条。 – user2357112

您应该使用Python Pandas lib来创建您在帖子中显示的数据框的类型。 –

我从哪里开始？我应该看的任何方法？ – blacksite

好第一件事情，你的字典是不是一个字典，并且现在应建设成为一个像这样

d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1}, 
    'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0}, 
    'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}}

有，我们实际上我们可以用大熊猫建立一个数据帧一本字典，而是在为了以你想要的方式获得它，我们将不得不从字典中建立一个列表清单。然后，我们将创建一个数据框和标记列，然后排序

import collections 
import pandas as pd 

d = {'document1':{"it's": 0,"they're": 2,"there's": 5,"he's": 1}, 
    'document2':{"it's": 4,"they're": 2,"there's": 3,"he's": 0}, 
    'document3':{"it's": 7,"they're": 0,"there's": 4,"he's": 1}} 

d = pd.DataFrame([[k,k1,v1] for k,v in d.items() for k1,v1 in v.items()], columns = ['File','Words','Count']) 
print d.sort(['File','Count'], ascending=[1,1]) 

     File Words Count 
1 document1  it's  0 
0 document1  he's  1 
3 document1 they're  2 
2 document1 there's  5 
4 document2  he's  0 
7 document2 they're  2 
6 document2 there's  3 
5 document2  it's  4 
11 document3 they're  0 
8 document3  he's  1 
10 document3 there's  4 
9 document3  it's  7

如果你想与前n次出现，那么你可以使用groupby()，然后要么排序

d = d.sort(['File','Count'], ascending=[1,1]).groupby('File').head(2) 

     File Words Count 
1 document1  it's  0 
0 document1  he's  1 
4 document2  he's  0 
7 document2 they're  2 
11 document3 they're  0 
8 document3  he's  1

时head() or tail()列表理解返回名单列表，看起来像这样

d = [['document1', "he's", 1], ['document1', "it's", 0], ['document1', "there's", 5], ['document1', "they're", 2], ['document2', "he's", 0], ['document2', "it's", 4], ['document2', "there's", 3], ['document2', "they're", 2], ['document3', "he's", 1], ['document3', "it's", 7], ['document3', "there's", 4], ['document3', "they're", 0]]

为了正确地建立字典，你只需要使用一些东西克

d['document1']['it\'s'] = 1

如果由于某种原因，你使用STR的元组和类型的字典的列表，你可以使用这个列表理解，而不是

[[i[0],k1,v1] for i in d for k1,v1 in i[1].items()]

来源

2015-11-04 21:19:45 SirParselot

很好的答案。一个问题：'d.sort（['File'，'Count']，升序= [1,1]）'也会改变索引。你为什么要这样做的任何特殊原因？ –

@JoeR我只是改变了它，所以文件从低到高的顺序，然后设置相同的计数。这不是必要的，但我认为它看起来好一点。 – SirParselot

这样的事情呢？第一

word_list = ["it's","they're","there's","he's"] 

frequencies = [('document1',{"it's": 0,"they're": 2,"there's": 5,"he's": 1}), 
('document2',{"it's": 4,"they're": 2,"there's": 3,"he's": 0}), 
('document3',{"it's": 7,"they're": 0,"there's": 4,"he's": 1})] 

result = [] 
for document in frequencies: 
    for word in word_list: 
     result.append({"file":document[0], "word":word,"count":document[1][word]}) 

print result

来源

2015-11-04 20:53:12 Jephron

我得到以下错误：'TypeError：字符串索引必须是整数，而不是str'。我不能使用这个词本身来索引 – blacksite

您是否使用与我相同的数据运行代码？唯一可能失败的地方是'document [1] [word]'，并且'document [1]'中的所有键都是提供的数据中的字符串。不应该失败。编辑：第二个想到的错误意味着你试图访问另一个字符串的字符串的元素。你的频率是否包含任何原始字符串？ – Jephron

我不这么认为。从字面上看，这虽然比我使用的实际数据简单得多。它遵循完全相同的语法结构，但“频率”只是方式更容易谈论 – blacksite

迭代字典中的多个值？

回答

相关问题