2017-07-18 72 views

回答

1

我认为这将解决您的问题

import org.apache.spark.mllib.stat.Statistics 
Statistics.corr(col1) 
Statistics.corr(col2) 
0

这里是一个简单的例子,你可以找到详细的计算相关性here

import org.apache.spark.mllib.stat.Statistics 
import org.apache.spark.rdd.RDD 


val col1: RDD[Double] = spark.sparkContext.parallelize(Seq(3,4,3,2,3,5,7,6,5)) 
val col2: RDD[Double] = spark.sparkContext.parallelize(Seq(1,0,0,1,1,1,0,1,0)) 

// compute the correlation using Pearson's method 
val correlation: Double = Statistics.corr(col1, col2, "pearson") 
println(s"Correlation is: $correlation") 

希望这有助于!