我正在尝试使用Spark和Dataframes的新ML库来构建隐式评级的推荐程序。 我的代码与ML火花和数据帧的隐式推荐
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import Row
from pyspark.ml.recommendation import ALS
sc = SparkContext()
sqlContext = SQLContext(sc)
# create the dataframe (user x item)
df = sqlContext.createDataFrame(
[(0, 0), (0, 1), (1, 1), (1, 2), (2, 1), (2, 2)],
["user", "item"])
als = ALS() \
.setRank(10) \
.setImplicitPrefs(True)
model = als.fit(df)
print "Rank %i " % model.rank
model.userFactors.orderBy("id").collect()
test = sqlContext.createDataFrame([(0, 2), (1, 0), (2, 0)], ["user", "item"])
predictions = sorted(model.transform(test).collect(), key=lambda r: r[0])
for p in predictions: print p
不过,我在此错误运行
pyspark.sql.utils.AnalysisException:无法解析给定的输入列用户 '评级',项目;
所以,不知道如何定义数据帧
你有什么做的代码行代表'DF = sqlContext.createDataFrame任何线索(单独的API调用[ (0,0),(0,1),(1,1),(1,2),(2,1),(2,2)],[“user”,“item”])'? – eliasah