2016-12-07 168 views
0

我想弄清楚如何制作一个程序,该文件需要用户选择的文件(通过输入文件名)并计算每个单词的频率投入。在Python中的文本文件中计算单词的频率

我有大部分,但是当我在多个词的程序,找出输入,仅第一字显示正确的频率,其余显示为“0次出现”

file_name = input("What file would you like to open? ") 
f = open(file_name, "r") 
the_full_text = f.read() 
words = the_full_text.split() 
search_word = input("What words do you want to find? ").split(",") 
len_list = len(search_word) 

word_number = 0 
print() 
print ('... analyzing ... hold on ...') 
print() 
print ('Frequency of word usage within', file_name+":") 
for i in range(len_list): 

    frequency = 0 
    for word in words: 
     word = word.strip(",.") 
     if search_word[word_number].lower() == word.lower(): 
      frequency += 1 
    print (" ",format(search_word[word_number].strip(),'<20s'),"/", frequency, "occurrences") 
    word_number = word_number + 1 

等的例子输出将是:

What file would you like to open? assignment_8.txt 
What words do you want to find? wey, rights, dem 

... analyzing ... hold on ... 

Frequency of word usage within assignment_8.txt: 
    wey    /96 occurrences 
    rights    /0 occurrences 
    dem    /0 occurrences 

我的程序出了什么问题?请帮忙:o

+2

如果你在''分裂,'',你的输入不应该是''wey,rights,dem'',没有空白吗? –

回答

1

您需要去掉搜索词中的空格。

但是,您当前的算法效率非常低,因为它必须重新扫描每个搜索词的整个文本。这是一个更有效的方法。首先,我们清理搜索词并将其放入列表中。然后,我们在该列表中建立一个字典,以便在文本文件中找到它们时存储每个这些字词的计数。

file_name = input("What file would you like to open? ") 
with open(file_name, "r") as f: 
    words = f.read().split() 

search_words = input("What words do you want to find? ").split(',') 
search_words = [word.strip().lower() for word in search_words] 
#print(search_words) 
search_counts = dict.fromkeys(search_words, 0) 

print ('\n... analyzing ... hold on ...') 
for word in words: 
    word = word.rstrip(",.").lower() 
    if word in search_counts: 
     search_counts[word] += 1 

print ('\nFrequency of word usage within', file_name + ":") 
for word in search_words: 
    print(" {:<20s}/{} occurrences".format(word, search_counts[word])) 
相关问题