如何推断星火MLlib预测类标签计算原始分数

https://spark.apache.org/docs/latest/mllib-optimization.html

星火文件下面二元分类预测的示例代码段：

val model = new LogisticRegressionModel(
    Vectors.dense(weightsWithIntercept.toArray.slice(0,weightsWithIntercept.size - 1)), 
    weightsWithIntercept(weightsWithIntercept.size - 1)) 

    // Clear the default threshold. 
    model.clearThreshold() 

    // Compute raw scores on the test set. 
    val scoreAndLabels = test.map { point => 
    val score = model.predict(point.features) 
    (score, point.label)

正如你看到的model.prediction（point.features）返回原始分数，它是超平面分离距离的边界。

我的问题是：

（1）我怎么能知道，如果基于上述计算原始分数的预测类别标签是0或1？

或者

（2）如何推断预测类别标签（0或1）在从上述计算原始分数这种二元分类情况？

来源

2017-04-07 Tom

默认情况下，阈值为0.5，因此使用BinaryClassificationMetrics时，如果分数为< 0.5，则分类标签为0，如果分数较高，则分类为0。所以你可以做同样的从分数中推断出课堂。

来源

2017-04-07 18:36:02

如何获得算法确定的计算ROC曲线的最佳阈值？ – Tom

在度量对象上，您可以通过各种度量的阈值获取得分。例如：val f1Score = metrics.fMeasureByThreshold。然后你可以迭代找到最好的阈值细节：https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html –

如何推断星火MLlib预测类标签计算原始分数

回答

相关问题