我想为numClusters = 6
做kmeans标签,以便稍后可以按标签分组。Kmeans - group by
如何选择要做kmeans的列?
val clusterThis = scaledDF.select($"id",$"setting1",$"setting2",$"setting3")
// dataset description lists six operation modes
val operatingModes = 6
// Cluster the data into two classes using KMeans
val numClusters = operatingModes
val numIterations = 20
import sqlContext.implicits._
val clusters = KMeans.train(clusterThis.rdd, numClusters, numIterations)
clusters.predict(clusterThis)
//... join back on id
你使用'ML'还是'MLLib'? –
我可以使用任何如果它的可用性,我认为上述使用rdd/MLLib – oluies
啊ML有一个很好的检查 https://spark.apache.org/docs/latest/ml-clustering.html – oluies