Q

collections.Counter如何实现快速排序？

2014-02-09 117 views 2 likes

2

最近我一直在玩Python的collections.Counter数据结构。该法服使用这个对象是计算一个文本文件中的单词出现的次数：collections.Counter如何实现快速排序？

from collections import Counter 

with open(r'filename') as f: 
    #create a list of all words in the file using a list comprehension 
    words = [word for line in f for word in line.split()] 

c = Counter(words)

凉爽的部分是你可以使用此结构弄清楚哪些单词是最常见的：

for word, count in c.most_common(): 
    print word, count

我不明白的部分是most_common()runs in O(n) time [编辑：这是不正确的。根据Martijn的回答，它实际上运行在O（n log k）]。显然这意味着它不能与背后的字典进行比较排序，因为最快的比较排序是O（nlogn）。

那么collections.Counter如何实现快速排序？

2014-02-09 BringMyCakeBack

A

回答

4

它确实不是运行在O（n）时间。当您在字典中询问全部值时，将使用常规排序，O（NlogN）算法。

当要求一个前K结果，使用heapq.nlargest()电话，在O更有效的方法（NlogK）时间：

def most_common(self, n=None): 
    '''List the n most common elements and their counts from the most 
    common to the least. If n is None, then list all element counts. 

    >>> Counter('abcdeabcdabcaba').most_common(3) 
    [('a', 5), ('b', 4), ('c', 3)] 

    ''' 
    # Emulate Bag.sortedByCount from Smalltalk 
    if n is None: 
     return sorted(self.iteritems(), key=_itemgetter(1), reverse=True) 
    return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))

有关线性时间正在做计时答案会谈;构建Counter例如，基本上一循环输入可迭代：

for elem in iterable: 
    self[elem] = self_get(elem, 0) + 1

2014-02-09 01:23:35

+0

“当询问最高N个结果时” - 用K代替N与O（NlogK）保持一致。 – user2357112

+0

这非常有帮助，谢谢！为了确保，我知道如何理解对'most_common（k）'的调用，基于你的回答： 1）创建一个大小为K的堆。 2）算法遍历字典中的所有N个项目，并检查项目是否大于堆中的最小项目。 3）如果新项目较大，则将其添加到堆中，并删除最小项目。由于第三步可能发生在最坏的N次，并且最坏的log（k）的成本，这给我们总的NlogK。这听起来是对的吗？ – BringMyCakeBack

+0

这听起来大致正确。 –

1

排序不是以线性时间运行的部分。这需要O（nlog（k）），其中n是计数器中的项目数，k是您从most_common请求的项目数。计数的需要线性时间。它基本上是这样做的：

for item in iterable: self[item] = self.get(item, 0) + 1

2014-02-09 01:24:39 user2357112

相关问题

1. 快速排序python实现
2. 实现快速排序
3. 快速排序实现
4. 帮助实现快速排序
5. 快速排序：用Python实现
6. 3路快速排序（C实现）
7. ArrayIndexOutOfBoundsException异常的快速排序实现
8. 快速排序实现中的ArrayIndexOutofBound
9. 快速排序实现中的问题
10. C++中的快速排序实现