我试图将一个单独的文件(停用词)中包含要被删除的单词用回车符(“\ n”)分隔的单词中标识的单词删除。从文件中删除单词
现在我将两个文件转换为列表,以便可以比较每个列表的元素。我有这个功能可以工作,但它并没有删除我在停用词文件中指定的所有单词。任何帮助是极大的赞赏。
def elimstops(file_str): #takes as input a string for the stopwords file location
stop_f = open(file_str, 'r')
stopw = stop_f.read()
stopw = stopw.split('\n')
text_file = open('sample.txt') #Opens the file whose stop words will be eliminated
prime = text_file.read()
prime = prime.split(' ') #Splits the string into a list separated by a space
tot_str = "" #total string
i = 0
while i < (len(stopw)):
if stopw[i] in prime:
prime.remove(stopw[i]) #removes the stopword from the text
else:
pass
i += 1
# Creates a new string from the compilation of list elements
# with the stop words removed
for v in prime:
tot_str = tot_str + str(v) + " "
return tot_str
我不认为这是必要的 - 他正在迭代'stopw'并从'prime'中删除元素 –
@SamMussmann谢谢,我刚刚注意到了这一点。用标点符号导致OP的问题的理论编辑了我的答案。 –