在python中使用naive bayes的文本分类

我创建了一个模型，在其中运行朴素贝叶斯以获得预期的输出。在python中使用naive bayes的文本分类

from textblob.classifiers import NaiveBayesClassifier as NBC 
from textblob import TextBlob 
training_corpus = [ 
('Agree Completely Agree Strongly Agree Somewhat Disagree Somewhat Disagree Strongly Completely Disagree','TRUE'), 
('Concerned 2 3 4 5 6 7 - Comfortable','TRUE'), 
('1 - disagree strongly 2 - disagree somewhat 3 - neither agree nor disagree 4 - agree somewhat 5 - agree strongly','TRUE'), 
('1 - doesn\'t apply at all 2 3 4 5 6 7 - applies completely','TRUE'), 
('1 - extremely new and different 2 3 4 5 6 7 - not at all new & different','TRUE'), 
('1 - extremely relevant 2 3 4 5 6 7 - not at all relevant','TRUE'), 
('1 - I don\'t want brands to engage with me at all on social media 2 3 4 5 6 7 - I love to engage with brands on social media','TRUE'), 
    ('1 - Most Important 2 3 4 5 - Least Important','TRUE'),  
    ('pepsi','FALSE'), 
    ('coca cola','FALSE'), 
    ('hyundai','FALSE'),   
    ('Audio quality','FALSE'), 
    ('Product features ','FALSE'), 
    ('Content ','FALSE') 
] 
test_corpus = [ 
    ('1 - Agree Completely 2 - Agree Strongly 3 - Agree Somewhat 4 - Disagree Somewhat 5 - Disagree Strongly 6 - Completely Disagree','TRUE'), 
    ('1 - Concerned 2 3 4 5 6 7 - Comfortable','TRUE'), 
    ('Content ','FALSE'), 
    ('Ease of navigation','FALSE') 
] 
model = NBC(training_corpus) 
print(model.classify('pepsi')) 
print(model.accuracy(test_corpus)*100)

当我运行此代码时，它显示100％的效率，但每次都返回FALSE。我不确定什么是错的，但那不是预期的输出。

来源

2017-09-25 Satish Dwivedi

您的模型是好的，它只是你的数据和分类器。
我的意思是你所提供的训练数据，它的工作原理好，让我们来测试一下：

def test(s): 
    prob_dist = model.prob_classify(s) 
    print("classifiying", s) 
    print("possibility of being FALSE:", round(prob_dist.prob("FALSE"), 2), 
      "possibility of being TRUE:" ,round(prob_dist.prob("TRUE"), 2)) 
    print('-'*70) 

test_cases = ['1', '1 - ', '2', '2 3 4 5', '1- 2 3 4 5', 'pepsi', 'coca', 'BMW'] 
for tc in test_cases: 
    test(tc)

现在这里是输出，这是相当不错的，

classifiying 1 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 1 - 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 2 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying 2 3 4 5 
possibility of being FALSE: 0.05 possibility of being TRUE: 0.95 
---------------------------------------------------------------------- 
classifiying 1- 2 3 4 5 
possibility of being FALSE: 0.0 possibility of being TRUE: 1.0 
---------------------------------------------------------------------- 
classifiying pepsi 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying coca 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
---------------------------------------------------------------------- 
classifiying BMW 
possibility of being FALSE: 1.0 possibility of being TRUE: 0.0 
--------------------------------------------------------------------

OK，现在你想知道为什么分类器是这样工作的？看看你的代码，你在哪里提到了特征向量？没有，所以它使用默认函数来提取特征向量为explained here。（你可以看到看一看的source code）

例如模型的特点可以看出这样的：

model.show_informative_features() 


>>> Most Informative Features 
      contains(4) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(3) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(5) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(2) = False   FALSE : TRUE =  5.6 : 1.0 
      contains(1) = False   FALSE : TRUE =  3.3 : 1.0 
      contains(7) = False   FALSE : TRUE =  2.4 : 1.0 
      contains(6) = False   FALSE : TRUE =  2.4 : 1.0 
      contains(at) = False   FALSE : TRUE =  1.9 : 1.0 
      contains(all) = False   FALSE : TRUE =  1.9 : 1.0 
      contains(not) = False   FALSE : TRUE =  1.3 : 1.0

来源

2017-09-25 22:00:02

谢谢伊曼...我的工作就可以了，将让你知道，如果有有任何问题。 –

不客气:) –

在python中使用naive bayes的文本分类

回答

相关问题