添加单词到nltk stoplist

我有一些代码可以从我的数据集中删除停用词，因为停止列表似乎并没有删除我希望的大多数单词，我正在寻找将单词添加到这个停止列表，以便它将在这种情况下删除它们。我使用去除停止词的代码是：添加单词到nltk stoplist

word_list2 = [w.strip() for w in word_list if w.strip() not in nltk.corpus.stopwords.words('english')]

我不能确定正确的语法用于添加的话，似乎无法在别处找到正确的一个。任何帮助表示赞赏。谢谢。

来源

2011-04-01 Alex

英语停用词是nltk/corpus/stopwords/english.txt中的一个文件（我想它会在这里......我没有在这台机器上使用nltk ..最好的事情是搜索'english.txt在nltk回购）

您可以在此文件中添加新的停用词。

也尽量寻找bloom filters如果您停止词列表增加到几百

来源

2011-04-01 11:11:29 Rafi

任何良好的英语停止字在那里编辑它？ nltk一个似乎很差 – fabrizioM 2011-04-01 11:15:38

@fabrizioM http://fs1.position2.com/bm/txt/stopwords.txt这是我在我上次公司使用的名单.. – Rafi 2011-04-01 11:23:14

@Rafi这是一个比NLTK ！谢谢！ – 2015-09-18 23:36:16

我总是在任何需要它的模块的顶部做stopset = set(nltk.corpus.stopwords.words('english'))。然后，向该集合添加更多单词很容易，而且会员检查速度更快。

来源

2011-04-01 16:01:14 Jacob

也在寻找解决方案。在发现一些线索和错误之后，我要将词语添加到停止列表中。希望这可以帮助。

def removeStopWords(str): 
#select english stopwords 
cachedStopWords = set(stopwords.words("english")) 
#add custom words 
cachedStopWords.update(('and','I','A','And','So','arnt','This','When','It','many','Many','so','cant','Yes','yes','No','no','These','these')) 
#remove stop words 
new_str = ' '.join([word for word in str.split() if word not in cachedStopWords]) 
return new_str

来源

2015-01-08 13:40:00

我在我的Ubuntu机器上的做法是，我在Ctrl + F中为“停用词”。它给了我一个文件夹。我走进里面有不同的文件。我打开了几乎只有128个单词的“英语”。添加了我的话。保存并完成。

来源

2015-03-21 08:40:49 Sankalp

您可以简单地使用append方法将单词添加到它：

stopwords = nltk.corpus.stopwords.words('english') 
stopwords.append('newWord')

或延长追加的单词列表，作为意见建议查理。

stopwords = nltk.corpus.stopwords.words('english') 
newStopWords = ['stopWord1','stopWord2'] 
stopwords.extend(newStopWords)

来源

2017-09-12 16:42:03

'CustomListofWordstoExclude = ['cat'，'dog'] stopwords.extend（CustomListofWordstoExclude）' 我用过你的代码，但后来用'extend（）'把我自己的列表添加到它 – Charlie 2018-01-10 23:26:53

好点！刚刚将您的建议添加到答案！ – 2018-01-12 16:07:28

在Windows上C：\ Users \用户名\ AppData \漫游\ nltk_data \语料库去这个路径停用词，并根据要求

来源

2017-12-12 06:27:32 Kiran

添加单词到nltk stoplist

回答

相关问题