我试图使用随机森林模型来预测示例流,但看起来我无法使用该模型对示例进行分类。 这里是pyspark使用的代码:结合Spark Streaming + MLlib
sc = SparkContext(appName="App")
model = RandomForest.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={}, impurity='gini', numTrees=150)
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream(hostname, int(port))
parsedLines = lines.map(parse)
parsedLines.pprint()
predictions = parsedLines.map(lambda event: model.predict(event.features))
并且在集群中的编译它返回的错误:
Error : "It appears that you are attempting to reference SparkContext from a broadcast "
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
是有使用从静态数据产生的MODELE以预测的方式流媒体示例?
谢谢你们,我真的很感激它!
我写了一个类似的问题在这里https://stackoverflow.com/questions/48846882/pyspark-ml-streaming –