2016-10-26 46 views
0

我有一个小的遇到零错误,但我找不到它。我的目的是比较一个包含这些单词的文本文件。ZeroDivisionError,但我找不到错误

secondly 
pardon 
woods 
secondly 

我写的剧本,以这种方式比较两个值:

secondly, pardon 
secondly, woods 
secondly, secondly 
pardon, woods 
pardon, secondly 
woods, secondly 

我的代码执行以下操作:

1)如果字是相同的,它会给1分否则它是由gensim矢量模型计算得分 2)有一个计数器,当第一个for循环移动到下一个单词时,计数器将重置。例如,第二,赦免>其次,树林>其次,其次(在该点的计数为3)

代码

from __future__ import division 
import gensim 


textfile = 'businessCleanTxtUniqueWords' 
model = gensim.models.Word2Vec.load("businessSG") 
count = 0 # keep track of counter 
score = 0 
avgScore = 0 
SentenceScore = 0 
externalCount = 0 
totalAverageScore = 0 

with open(textfile, 'r+') as f1: 

    words_list = f1.readlines() 

    for each_word in words_list: 
     word = each_word.strip() 

     for each_word2 in words_list[words_list.index(each_word) + 1:]: 
      count = count + 1 

      try: 
       word2 = each_word2.strip() 
       print(word, word2) 
       # if words are the same 
       if (word == word2): 
        score = 1 
       else: 
        score = model.similarity(word,word2) # when words are not the same 
      # if word is not in vector model 
      except KeyError: 
       score = 0 
      # to keep track of the score 
      SentenceScore=SentenceScore + score 

      print("the score is: " + str(score)) 
      print("the count is: " + str(count)) 
     # average score 
     avgScore = round(SentenceScore/count,5) 

     print("the avg score: " + str(SentenceScore) + '/' + str(count) + '=' + str(avgScore)) 
     # reset counter and sentence score 
     count = 0 
     SentenceScore = 0 

错误消息:

Traceback (most recent call last): 
    File "C:/Users/User/Desktop/Complete2/Complete/TrainedTedModel/LatestJR.py", line 41, in <module> 
    avgScore = round(SentenceScore/count,5) 
ZeroDivisionError: division by zero 
('secondly', 'pardon') 
the score is: 0.180233083443 
the count is: 1 
('secondly', 'woods') 
the score is: 0.181432347816 
the count is: 2 
('secondly', 'secondly') 
the score is: 1 
the count is: 3 
the avg score: 1.36166543126/3=0.45389 
('pardon', 'woods') 
the score is: 0.405021005657 
the count is: 1 
('pardon', 'secondly') 
the score is: 0.180233083443 
the count is: 2 
the avg score: 0.5852540891/2=0.29263 
('woods', 'secondly') 
the score is: 0.181432347816 
the count is: 1 
the avg score: 0.181432347816/1=0.18143 

我已经包含“from __future__ import division”这个部门,但它似乎没有修复它

我的文件可以在下面的链接中找到:

Gensim型号:

https://entuedu-my.sharepoint.com/personal/jseng001_e_ntu_edu_sg/_layouts/15/guestaccess.aspx?guestaccesstoken=BlORQpsmI6RMIja55I%2bKO9oF456w5tBLR43XZdVCQIA%3d&docid=00459c024d33d48638508dd331cf73144&rev=1&expiration=2016-11-25T23%3a56%3a48.000Z

文本文件:

https://entuedu-my.sharepoint.com/personal/jseng001_e_ntu_edu_sg/_layouts/15/guestaccess.aspx?guestaccesstoken=7%2b8Nkm9BySPFR0zqD%2fdgUcYOaXREG3%2fycALnMFcv59A%3d&docid=08158c442c3f74970bc8090f253b499f8&rev=1&expiration=2016-11-25T23%3a56%3a01.000Z

谢谢。

+2

'count'可能为零,因为您的第一个'words_list [words_list.index(each_word)+ 1:]'可能是空的。 – kichik

+0

嗨kichik,我不太明白。你能多解释一下吗?谢谢 – windboy

+0

你是否彻底查看了你的错误?它应该给出行号和错误代码片段。 –

回答

1

这是因为第一个for循环已达到最后一个字,第二个for循环将不会执行,因此count等于零(在上次迭代中复位为零)。只要改变第一for循环忽略硬道理(因为它是没有必要的):

for each_word in words_list[:-1]: 
+0

非常感谢。你是对的,那是问题所在。再次感谢。 – windboy

1

是出错的错误消息直接陈述行:

Traceback (most recent call last): 
    File "C:/Users/User/Desktop/Complete2/Complete/TrainedTedModel/LatestJR.py", line 41, in <module> 
    avgScore = round(SentenceScore/count,5) 
ZeroDivisionError: division by zero 

所以我将假设SentenceScore/count是有问题的部门,因此很显然,count是0,我建议你行添加像前右:

print("SentenceScore is",SentenceScore, "and count is",count) 

所以你可以看到自己这一点,现在由于内环路:

为each_word2在words_list [words_list.index(each_word)+ 1:]: 计数=计数+ 1

是唯一的事情,在外部循环的每次迭代结束时,计数和计数正被重置为零,这意味着内部循环根本没有在某个点运行,这意味着words_list[words_list.index(each_word) + 1:]是一个空序列。当each_wordwords_list中的最后一个词时会发生这种情况。

+0

非常感谢。 – windboy