尝试在Apache Spark中为分类模型实现predictRaw（）

开发者API示例（https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala）给出了分类模型中函数predictRaw（）的简单实现示例。这是抽象类ClassificationModel中必须在具体类中实现的功能。根据显影剂API例如，可以按以下方法计算它：尝试在Apache Spark中为分类模型实现predictRaw（）

override def predictRaw(features: Features.Type): Vector = { 
    val margin = BLAS.dot(features, coefficients) 
    Vectors.dense(-margin, margin) // Binary classification so we return a length-2 vector, where index i corresponds to class i (i = 0, 1). 
}

我的BLAS.dot(features, coefficients)理解的是，这仅仅是特征向量（长度numFeatures的）的矩阵点积由系数向量（长度的numFeatures），因此有效地将每个“特征”列以一个系数加以折叠，然后求和得到val margin。然而，Spark不再提供对BLAS库的访问权限，因为它在MLlib中是私有的，而Matrix Matrix中提供了多种工厂方法进行乘法的矩阵mutliplication。

我如何使用矩阵工厂方法来实现predictRaw()理解如下：

override def predictRaw(features: Vector): Vector = { 

//coefficients is a Vector of length numFeatures: val coefficients = Vectors.zeros(numFeatures) 
val coefficientsArray = coefficients.toArray 
val coefficientsMatrix: SparkDenseMatrix = new SparkDenseMatrix(numFeatures, 1, coefficientsArray) 
val margin: Array[Double] = coefficientsMatrix.multiply(features).toArray // contains a single element 
val rawPredictions: Array[Double] = Array(-margin(0),margin(0)) 
new SparkDenseVector(rawPredictions) 
}

这将需要转换的数据结构数组的开销。有没有更好的办法？ BLAS现在是私人的，这似乎很奇怪。 NB。代码未经测试！目前val coefficients: Vector只是一个零向量，但是一旦我实现了学习算法，这将包含结果。

来源

2017-08-30 LucieCBurgess

我想我已经解决了这个问题。 Spark DeveloperAPI示例非常令人困惑，因为predictRaw（）计算逻辑回归类型示例的置信区间。然而，当实现ClassificationModel时，predictRaw（）实际上应该做的是预测输入数据集的每个第i个样本的输出标签矢量。从技术角度讲，上面的矩阵乘法在没有使用BLAS的情况下是正确的 - 但实际上预测Raw（）不必这样计算。

从底层源代码： https://github.com/apache/spark/blob/v2.2.0/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala

* @return vector where element i is the raw prediction for label i. * This raw prediction may be any real number, where a larger value indicates greater * confidence for that label.

该函数然后raw2predict计算的实际标签从原始预测但不需要被实现为这是由API来完成。

来源

2017-09-02 14:48:29 LucieCBurgess

尝试在Apache Spark中为分类模型实现predictRaw（）

回答

相关问题