我对样本数据分类正面和负面情绪。我使用了下面的代码片断。Scikit-learn - 在测量精确度时获取NAN值。
一切看起来都OK,直到第20行打印预期的预测。
但是,当我尝试使用度量标准来衡量准确性时,它给了我“NAN”值。你可以请检阅我的代码,并帮我找出问题。
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import metrics
import csv
# Read in the training data.
with open("/Users/max/train.csv", 'r') as file:
reviews = list(csv.reader(file))
with open("/Users/max/test.csv",'r') as file:
test_reviews = list(csv.reader(file))
vectorizer = TfidfVectorizer(min_df=1)
train_features = vectorizer.fit_transform([review[0] for review in reviews])
test_features = vectorizer.transform([test_review[0] for test_review in test_reviews])
nb = MultinomialNB()
nb.fit(train_features, [int(review[1]) for review in reviews])
predictions = nb.predict(test_features)
print("prediction : {0}".format(predictions))
actual = [int(r[1]) for r in test_reviews]
fpr, tpr, threshold = metrics.roc_curve(actual, predictions, pos_label=1)
print("Multinomial naive bayes AUC: {0}".format(metrics.auc(fpr, tpr)))
集样本以这种格式
i like google , 1
i dont really like microsoft , -1
这里是控制台输出你没有在你的数据的真正积极的实例
prediction : [1 -1]
/Library/Python/2.7/site-packages/sklearn/metrics/ranking.py:496: UndefinedMetricWarning: No positive samples in y_true, true positive value should be meaningless UndefinedMetricWarning)
Multinomial naive bayes AUC: nan
您是否尝试过使用'roc_auc_score'来代替? http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score – Dair
@Dair,似乎它的工作原理。他们之间有什么不同? – Max
我不知道,但文件指出它作为替代。 – Dair