2017-09-25 34 views
0

我创建了一个模型,在其中运行朴素贝叶斯以获得预期的输出。在python中使用naive bayes的文本分类

from textblob.classifiers import NaiveBayesClassifier as NBC 
from textblob import TextBlob 
training_corpus = [ 
('Agree Completely Agree Strongly Agree Somewhat Disagree Somewhat Disagree Strongly Completely Disagree','TRUE'), 
('Concerned 2 3 4 5 6 7 - Comfortable','TRUE'), 
('1 - disagree strongly 2 - disagree somewhat 3 - neither agree nor disagree 4 - agree somewhat 5 - agree strongly','TRUE'), 
('1 - doesn\'t apply at all 2 3 4 5 6 7 - applies completely','TRUE'), 
('1 - extremely new and different 2 3 4 5 6 7 - not at all new & different','TRUE'), 
('1 - extremely relevant 2 3 4 5 6 7 - not at all relevant','TRUE'), 
('1 - I don\'t want brands to engage with me at all on social media 2 3 4 5 6 7 - I love to engage with brands on social media','TRUE'), 
    ('1 - Most Important 2 3 4 5 - Least Important','TRUE'),  
    ('pepsi','FALSE'), 
    ('coca cola','FALSE'), 
    ('hyundai','FALSE'),   
    ('Audio quality','FALSE'), 
    ('Product features ','FALSE'), 
    ('Content ','FALSE') 
] 
test_corpus = [ 
    ('1 - Agree Completely 2 - Agree Strongly 3 - Agree Somewhat 4 - Disagree Somewhat 5 - Disagree Strongly 6 - Completely Disagree','TRUE'), 
    ('1 - Concerned 2 3 4 5 6 7 - Comfortable','TRUE'), 
    ('Content ','FALSE'), 
    ('Ease of navigation','FALSE') 
] 
model = NBC(training_corpus) 
print(model.classify('pepsi')) 
print(model.accuracy(test_corpus)*100) 

当我运行此代码时,它显示100%的效率,但每次都返回FALSE。我不确定什么是错的,但那不是预期的输出。

回答

0

您的模型是好的,它只是你的数据和分类器。
我的意思是你所提供的训练数据,它的工作原理好,让我们来测试一下:

def test(s): 
    prob_dist = model.prob_classify(s) 
    print("classifiying", s) 
    print("possibility of being FALSE:", round(prob_dist.prob("FALSE"), 2), 
      "possibility of being TRUE:" ,round(prob_dist.prob("TRUE"), 2)) 
    print('-'*70) 

test_cases = ['1', '1 - ', '2', '2 3 4 5', '1- 2 3 4 5', 'pepsi', 'coca', 'BMW'] 
for tc in test_cases: 
    test(tc) 

现在这里是输出,这是相当不错的,

classifiying 1 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 1 - 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 2 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 2 3 4 5 
possibility of being FALSE: 0.05 possibility of being TRUE: 0.95 
---------------------------------------------------------------------- 
classifiying 1- 2 3 4 5 
possibility of being FALSE: 0.0 possibility of being TRUE: 1.0 
---------------------------------------------------------------------- 
classifiying pepsi 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying coca 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying BMW 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
-------------------------------------------------------------------- 

OK,现在你想知道为什么分类器是这样工作的? 看看你的代码,你在哪里提到了特征向量?没有,所以它使用默认函数来提取特征向量为explained here。 (你可以看到看一看的source code

例如模型的特点可以看出这样的:

model.show_informative_features() 


>>> Most Informative Features 
      contains(4) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(3) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(5) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(2) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(1) = False   FALSE : TRUE =  3.3 : 1.0 
      contains(7) = False   FALSE : TRUE =  2.4 : 1.0 
      contains(6) = False   FALSE : TRUE =  2.4 : 1.0 
      contains(at) = False   FALSE : TRUE =  1.9 : 1.0 
      contains(all) = False   FALSE : TRUE =  1.9 : 1.0 
      contains(not) = False   FALSE : TRUE =  1.3 : 1.0 
+1

谢谢伊曼...我的工作就可以了,将让你知道,如果有有任何问题。 –

+0

不客气:) –

相关问题