定义单词为2个字母或更多的蟒蛇2.6

我有一个python脚本，我正在写一个班级任务，它计算文本文档中前10个最频繁的单词，并显示单词及其频率。我能够让脚本的这部分工作得很好，但是作业说一个单词被定义为2个字母或更多。由于某些原因，我似乎无法将单词定义为2个字母或更多，当我运行脚本时，什么都不会发生。用你的脚本定义单词为2个字母或更多的蟒蛇2.6

# Most Frequent Words: 
from string import punctuation 
from collections import defaultdict 

def sort_words(x, y): 
    return cmp(x[1], y[1]) or cmp(y[0], x[0]) 

number = 10 
words = {} 

words_gen = (word.strip(punctuation).lower() for line in open("charactermask.txt") 
              for word in line.split()) 
words = defaultdict(int) 
for word in words_gen: 
    words[word] +=1 

letters = len(word) 

while letters >= 2: 
    top_words = sorted(words.iteritems(), 
         key=lambda(word, count): (-count, word))[:number] 

for word, frequency in top_words: 
    print "%s: %d" % (word, frequency)

来源

2012-09-17 Ty Bailey

我会重构代码 ~~和使用 collections.Counter对象~~ ：

import collections 
import string 

with open("charactermask.txt") as f: 
    words = [x.strip(string.punctuation).lower() for x in f.read().split()] 

counter = collections.defaultdict(int): 
for word in words: 
    if len(word) >= 2: 
    counter[word] += 1

来源

2012-09-17 00:26:42 wim

collections.Counter对象不在python 2.6中提供。 –

哦，对，你可以使用'defaultdict（int）'，因为你一直在那 – wim

我可以看到为什么这会起作用，但我实现了它，现在我回到根本没有得到回报...... –

的一个问题是循环

while letters >= 2: 
    top_words = sorted(words.iteritems(), 
         key=lambda(word, count): (-count, word))[:number]

你是不是经过这里的话循环;这个循环将永远循环。您需要更改脚本，以便脚本的这部分实际上遍历所有单词。（另外，你可能会想改变while到if因为你只需要一个代码，每个字执行一次。）

来源

2012-09-17 00:14:29

我改变到现在为止，现在我至少得到了单词的归还，但它仍然包括字母'a'作为单词。我怎样才能让这个迭代遍历所有的单词？ –

定义单词为2个字母或更多的蟒蛇2.6

回答

相关问题