继回答这个问题 How to convert type Row into Vector to feed to the KMeans转换数据帧到Vector.dense为K均值
我创建了功能表我的数据。(assembler
是一个Vector汇编)
val kmeanInput = assembler.transform(table1).select("features")
当我跑k均值与kmeanInput
val clusters = KMeans.train(kmeanInput, numCluster, numIteration)
我得到的错误
:102: error: type mismatch; found : org.apache.spark.sql.DataFrame (which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] val clusters = KMeans.train(kmeanInput, numCluster, numIteration)
由于@Jed在他的回答中提到,发生这种情况是因为行不是Vectors.dense
格式。 为了解决这个我试过
val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in
row["features"]]))
而且我得到这个错误
:3: error: ')' expected but '(' found. val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in row["features"]]))
:3: error: ';' expected but ')' found. val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in row["features"]]))
非常棘手。谢谢,它工作。 – Sha2b