2017-02-15 51 views
3

在下面的代码中,我想要计算word_listword_list中每个单词的出现次数,下面的代码可以完成这项工作,但效率可能不高,有没有更好的方法做它?列表2中列表1的Python count元素发生

word_list = ["hello", "wonderful", "good", "flawless", "perfect"] 
test = ["abc", "hello", "vbf", "good", "dfdfdf", "good", "good"] 

result = [0] * len(word_list) 
for i in range(len(word_list)): 
    for w in test: 
     if w == word_list[i]: 
      result[i] += 1 

print(result) 

回答

6

使用collections.Counter算在test所有的字一气呵成,然后就得到了Counter是计数每个单词word_list

>>> word_list = ["hello", "wonderful", "good", "flawless", "perfect"] 
>>> test = ["abc", "hello", "vbf", "good", "dfdfdf", "good", "good"] 
>>> counts = collections.Counter(test) 
>>> [counts[w] for w in word_list] 
[1, 0, 3, 0, 0] 

或使用字典理解中:

>>> {w: counts[w] for w in word_list} 
{'perfect': 0, 'flawless': 0, 'good': 3, 'wonderful': 0, 'hello': 1} 

创建计数器应该是O(n),并且在每个查找O(1),给你O(N + M)为n个字test和m个词word_list

+0

先做过滤不是更有效吗? 此外,参考该页面:https://wiki.python.org/moin/TimeComplexity,列表中的查找是O(n),如果将'word_list'转换为组。 –

+0

@ZaccharieRamzi今天有什么“在一组中进行查找?你是第二个暗示这一点的人。我的答案不清楚吗?我不会在列表中查找,只能在这里查找字典,这与查找集合中的速度一样快。另外,什么过滤? –

+0

是的,你是对的我对我心中的想法感到困惑。 如果你这样做: 'words = set(word_list); new_test = [单词测试中的单词如果单词在单词中]; counts = collections.Counter(new_test)' 根据具体情况,您可能会得到更快的结果。 –

3

你可以在线性时间使用字典来做到这一点。

word_list = ["hello", "wonderful", "good", "flawless", "perfect"] 
test = ["abc", "hello", "vbf", "good", "dfdfdf", "good", "good"] 

result = [] 
word_map = {} 
for w in test: 
    if w in word_map: 
     word_map[w] += 1 
    else: 
     word_map[w] = 1 

for w in word_list: 
    result.append(word_map.get(w, 0)) 

print(result) 
+2

尼斯“无库”解决方案,但即使如此,你可以使用'GET'与默认情况下,例如使代码有点更紧凑'result.append(word_map.get(w,0))' –

1

您可以结合collections.Counteroperator.itemgetter

from collections import Counter 
from operator import itemgetter 

cnts = Counter(test) 
word_cnts = dict(zip(word_list, itemgetter(*word_list)(cnts))) 

其中给出:

>>> word_cnts 
{'flawless': 0, 'good': 3, 'hello': 1, 'perfect': 0, 'wonderful': 0} 

,或者如果您更希望它作为list

>>> list(zip(word_list, itemgetter(*word_list)(cnts))) 
[('hello', 1), ('wonderful', 0), ('good', 3), ('flawless', 0), ('perfect', 0)] 
+0

函数式编程的令人印象深刻的显示,但我仍然喜欢列表或字典理解。 ;-) –

+0

@tobias_k理解已被另一个答案“采取”。否则,我会添加它们:-P – MSeifert

-1

你可以尝试使用字典:

word_list = ["hello", "wonderful", "good", "flawless", "perfect"] 
test = ["abc", "hello", "vbf", "good", "dfdfdf", "good", "good"] 

result = {} 
for word in word_list: 
    result[word]=0 
for w in test: 
    if result.has_key(w): 
     result[w] += 1 
print(result) 

但是你会以不同的结构结束。 如果你不希望出现这种情况,你可以试试这个,而不是

word_list = ["hello", "wonderful", "good", "flawless", "perfect"] 
test = ["abc", "hello", "vbf", "good", "dfdfdf", "good", "good"] 

result = {} 
for w in test: 
    if(result.has_key(w)): 
     result[w] += 1 
    else: 
     result[w] = 1 
count = [0] * len(word_list) 
for i in range(len(word_list)): 
    if (result.has_key(word_list[i])): 
     count[i]=result[word_list[i]] 
print(count) 
相关问题