2014-11-02 42 views
0

我对某些Python代码有点麻烦。我有一个名为“big.txt”的大文本文件。我在我的代码中迭代了它,将每个单词排序到一个数组(或列表)中,然后再次遍历它以删除任何不在字母表中的字符。我也有一个叫worddistance的函数,看看两个单词有多相似,然后返回一个分数。我有另一个功能叫autocorrect。我想通过这个函数拼写错误的单词,并打印'Did you mean...'句子,在worddistance函数上得分较低的单词(只要注意到差异,函数就会将计数值加1) - 分数越低,则越相似)。
奇怪的是,我不断收到错误:比较字符串时出现索引错误 - Python

"Index Error: string index out of range"

我处发生了什么损失!

我的代码如下。

在此先感谢您的答复,
塞缪尔·诺顿

f = open("big.txt", "r") 

words = list() 

temp_words = list() 
for line in f: 
    for word in line.split(): 
     temp_words.append(word.lower()) 

allowed_characters = 'abcdefghijklmnopqrstuvwxyz'  
for item in temp_words: 
    temp_new_word = '' 
    for char in item: 
     if char in allowed_characters: 
      temp_new_word += char 
     else: 
      continue 
    words.append(temp_new_word) 
list(set(words)).sort() 

def worddistance(word1, word2): 
    counter = 0 
    if len(word1) > len(word2): 
     counter += len(word1) - len(word2) 
     new_word1 = word1[:len(word2) + 1] 
     for char in range(0, len(word2) + 1) : 
      if word2[char] != new_word1[char]: 
       counter += 1 
      else: 
       continue 
    elif len(word2) > len(word1): 
     counter += len(word2) - len(word1) 
     new_word2 = word2[:len(word1) + 1] 
     for char in range(0, len(word1) + 1): 
      if word1[char] != word2[char]: 
       counter += 1 
      else: 
       continue 
    return counter 

def autocorrect(word): 
    word.lower() 
    if word in words: 
     print("The spelling is correct.") 
     return 
    else: 
     suggestions = list() 
     for item in words: 
      diff = worddistance(word, item) 
      if diff == 1: 
       suggestions.append(item) 
     print("Did you mean: ", end = ' ') 

    if len(suggestions) == 1: 
       print(suggestions[0]) 
       return 

    else: 
     for i in range(0, len(suggestions)): 
      if i == len(suggestons) - 1: 
       print("or " + suggestions[i] + "?") 
       return 
      print(suggestions[i] + ", ", end="") 
      return 
+0

在哪一行你得到这个错误 – user3378649 2014-11-02 20:40:32

回答

0

worddistance(),它看起来像for char in range(0, len(word1) + 1):应该是:

for char in range(len(word1)): 

而且for char in range(0, len(word2) + 1) :应该是:

for char in range(len(word2)): 

顺便说一句,list(set(words)).sort()正在排序一个临时列表,这可能不是你想要的。它应该是:

words = sorted(set(words)) 
0

正如在其他评论中提到的,你应该range(len(word1))

除此之外: - 您应该考虑word1和words具有相同长度的情况#len(word2) == len(word1) - 您还应该注意命名。在wordDistance函数的第二个条件

if word1[char] != word2[char]: 

你应该比较new_word2

if word1[char] != new_word2[char]: 

- 自动更正,您应该分配低级到word= word.lower()

words= [] 
for item in temp_words: 
    temp_new_word = '' 
    for char in item: 
     if char in allowed_characters: 
      temp_new_word += char 
     else: 
      continue 
    words.append(temp_new_word) 
words= sorted(set(words)) 

def worddistance(word1, word2): 
    counter = 0 
    if len(word1) > len(word2): 
     counter += len(word1) - len(word2) 
     new_word1 = word1[:len(word2) + 1] 
     for char in range(len(word2)) : 
      if word2[char] != new_word1[char]: 
       counter += 1 
    elif len(word2) > len(word1): 
     counter += len(word2) - len(word1) 
     new_word2 = word2[:len(word1) + 1] 
     for char in range(len(word1)): 
      if word1[char] != new_word2[char]: #This is a problem 
       counter += 1 
    else: #len(word2) == len(word1)  #You missed this case 
     for char in range(len(word1)): 
      if word1[char] != word2[char]: 
       counter += 1 
    return counter 

def autocorrect(word): 
    word= word.lower() #This is a problem 
    if word in words: 
     print("The spelling is correct.") 
    else: 
     suggestions = list() 
     for item in words: 
      diff = worddistance(word, item) 
      print diff 
      if diff == 1: 
       suggestions.append(item) 
     print("Did you mean: ") 

     if len(suggestions) == 1: 
      print(suggestions[0]) 

     else: 
      for i in range(len(suggestions)): 
       if i == len(suggestons) - 1: 
        print("or " + suggestions[i] + "?") 
       print(suggestions[i] + ", ") 

下一次,尝试使用Python内置函数如enumerate,以避免使用i in range(list),然后list[i],len instea d的计数器..等

例如: 你的距离函数可以这样写,或更简单。

def distance(word1, word2): 
    counter= max(len(word1),len(word2))- min(len(word1),len(word2)) 
    if len(word1) > len(word2): 
     counter+= len([x for x,z in zip (list(word2), list(word1[:len(word2) + 1])) if x!=z]) 
    elif len(word2) > len(word1): 
     counter+= len([x for x,z in zip (list(word1), list(word2[:len(word1) + 1])) if x!=z]) 
    else: 
     counter+= len([x for x,z in zip (list(word1), list(word2)) if x!=z]) 
    return counter