转换数据帧到Vector.dense为K均值

继回答这个问题 How to convert type Row into Vector to feed to the KMeans 转换数据帧到Vector.dense为K均值

我创建了功能表我的数据。（assembler是一个Vector汇编）

val kmeanInput = assembler.transform(table1).select("features")

当我跑k均值与kmeanInput

val clusters = KMeans.train(kmeanInput, numCluster, numIteration)

我得到的错误

:102: error: type mismatch; found : org.apache.spark.sql.DataFrame (which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] val clusters = KMeans.train(kmeanInput, numCluster, numIteration)

由于@Jed在他的回答中提到，发生这种情况是因为行不是Vectors.dense格式。为了解决这个我试过

val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in 
row["features"]]))

而且我得到这个错误

:3: error: ')' expected but '(' found. val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in row["features"]]))

:3: error: ';' expected but ')' found. val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in row["features"]]))

来源

2017-05-03 Sha2b

您输入不正确的库，你应该使用KMeans从ml而不是mllib。第一个使用DataFrame，第二个使用RDD。

来源

2017-05-03 22:36:08

非常棘手。谢谢，它工作。 – Sha2b

转换数据帧到Vector.dense为K均值

回答

相关问题