3
在下面的代码中,我知道我的naivebayes分类器工作正常,因为它在trainset1上正常工作,但它为什么不在trainset2上工作?我甚至尝试过两个分类器,一个来自TextBlob,另一个来自nltk。nltk naivebayes分类器的文本分类
from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob
from nltk.tokenize import word_tokenize
import nltk
trainset1 = [('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg')]
trainset2 = [('hide all brazil and everything plan limps to anniversary inflation plan initiallyis limping its first anniversary amid soaring prices', 'class1'),
('hello i was there and no one came', 'class2'),
('all negative terms like sad angry etc', 'class2')]
def nltk_naivebayes(trainset, test_sentence):
all_words = set(word.lower() for passage in trainset for word in word_tokenize(passage[0]))
t = [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in trainset]
classifier = nltk.NaiveBayesClassifier.train(t)
test_sent_features = {word.lower(): (word in word_tokenize(test_sentence.lower())) for word in all_words}
return classifier.classify(test_sent_features)
def textblob_naivebayes(trainset, test_sentence):
cl = NaiveBayesClassifier(trainset)
blob = TextBlob(test_sentence,classifier=cl)
return blob.classify()
test_sentence1 = "he is my horrible enemy"
test_sentence2 = "inflation soaring limps to anniversary"
print nltk_naivebayes(trainset1, test_sentence1)
print nltk_naivebayes(trainset2, test_sentence2)
print textblob_naivebayes(trainset1, test_sentence1)
print textblob_naivebayes(trainset2, test_sentence2)
输出:
neg
class2
neg
class2
虽然test_sentence2明显属于1类。