我试图在PySpark运行我的数据朴素贝叶斯分类器1.3朴素贝叶斯pyspark 1.3没有反应
这里是我的数据样本:
使用文本文件,我将它转换成一个LabeledPoint对象
67,[0,1,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20, 21,22,23,24,25,26,27,28,29,30,31,32,3 ..... 60,66],[0.45,0.441666666667,0.475,0.0,0.717763157895,0.0,0.497300944669, 0.476608187135,0.0,0.0,0.45183714002,0.616666666667,0.966666666667,0.0790064102564,-0.364093614847,0.0679487179487,0.256043 956044,0.7,0.449583333333,0.231904697754,0.341666666667,0.06 ....,0.0]
data = MLUtils.loadLibSVMFile(sc, 'path to file')
training, test = data.randomSplit([0.7, 0.3], seed=0)
model = NaiveBayes.train(training, 1.0)
predictionAndLabel = test.map(lambda p: (model.predict(p.features), p.label))
accuracy = (
1.0 * predictionAndLabel.filter(lambda (x, v): x == v).count()/test.count()
)
PySpark似乎上计算变量模型永远挂起。其他人有没有遇到过这个问题?谢谢。
感谢这个解释!我能够看到Spark Scala shell中不接受负值并运行此朴素贝叶斯代码的错误消息。奇怪的是,Spark Spark 1.3 Shell只是挂在这段代码上。 –