0
好吧,让我训练了一个NaiveBayes电影评论分类器...但是,当我运行它反对负面评论(从一个网站,我复制并粘贴到一个txt文件)我是'pos'...我做错了什么?这是下面的代码:得到负面评论的'pos'测试
import nltk, random
from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = list(all_words)[:2000]
def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains({})'.format(word)] = (word in document_words)
return features
featuresets = [(document_features(d), c) for (d,c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set))
classifier.show_most_informative_features(5)
>>>0.67
>>>Most Informative Features
contains(thematic) = True pos : neg = 8.9 : 1.0
contains(annual) = True pos : neg = 8.9 : 1.0
contains(miscast) = True neg : pos = 8.7 : 1.0
contains(supports) = True pos : neg = 6.9 : 1.0
contains(unbearable) = True neg : pos = 6.7 : 1.0
f = open('negative_review.txt','rU')
fraw = f.read()
review_tokens =nltk.word_tokenize(fraw)
docfts = document_features(review_tokens)
classifier.classify(docfts)
>>> 'pos'
UPDATE重新运行程序几次之后,现在准确分类我的负面评论为负...有人可以帮助我了解为什么?或者这是简单的魔法?
该任务要求仅使用NaiveBayes分类器:/ –
您的代码没有问题,您只需改进功能。有一定的准确度门槛,你必须打? – megadarkfriend
nah ...实际上,重新运行几次后有什么奇怪的...它实际上将我的负面评论归类为负面!这太奇怪了......我会截取这个运行并在我的任务下发布!精度也自己上升到0.7!这是巫术吗? –