2017-05-26 28 views
0

我试图测试这个代码,我星火文档中找到,为了应对阿帕奇星火类别特征,使用Java:1的-K编码Apache的星火在Java

SparkSession spark = SparkSession 
      .builder().master("local[4]") 
      .appName("1-of-K encoding Test") 
      .getOrCreate(); 
List<Row> data = Arrays.asList(
      RowFactory.create(0, "a"), 
      RowFactory.create(1, "b"), 
      RowFactory.create(2, "c"), 
      RowFactory.create(3, "a"), 
      RowFactory.create(4, "a"), 
      RowFactory.create(5, "c") 
    ); 
StructType schema = new StructType(new StructField[]{ 
new StructField("id", DataTypes.IntegerType, false,Metadata.empty()), 
new StructField("category", DataTypes.StringType, false, Metadata.empty()) 
    }); 
Dataset<Row> df = spark.createDataFrame(data, schema); 
StringIndexerModel indexer = new StringIndexer() 
.setInputCol("category") 
.setOutputCol("categoryIndex") 
.fit(df); 

但我得到这个错误;拟合函数不能被称为

enter image description here

你有什么想法?

回答

0

你为什么要在更长的路线上创建DF?更有效的方法是:

import sparkSession.implicits._ 
    val df = sparkSession.sparkContext.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "d"), (4, "e"), (5, "f"))).toDF("id", "category") 

    val newDf = new StringIndexer() 
    .setInputCol("category") 
    .setOutputCol("categoryIndex") 
    .fit(df) 
    .transform(df) 
    .show; 

其中给出输出:

+---+--------+-------------+ 
| id|category|categoryIndex| 
+---+--------+-------------+ 
| 0|  a|   2.0| 
| 1|  b|   3.0| 
| 2|  c|   4.0| 
| 3|  d|   5.0| 
| 4|  e|   0.0| 
| 5|  f|   1.0| 
+---+--------+-------------+ 
+0

谢谢你现在工作对我来说,(我发现在Apache的MLIB文档中的例子) –