Spark中数组的数据集（1.6.1）

所以我一直在尝试重新格式化一个我正在使用Dataset API的项目，并且一直存在一些编码错误的问题。从我读过的内容来看，我认为我应该能够将原始值的数组存储在数据集中。但是，下面的类给我的编码错误：Spark中数组的数据集（1.6.1）

case class InvertedIndex(partition:Int, docs:Array[Int], indices:Array[Long], weights:Array[Double]) 

val inv = RDD[InvertedIndex] 
val invertedIndexDataset = sqlContext.createDataset(inv) 
invertedIndexDataset.groupBy(x => x.partition).mapGroups { 
    //... 
}

可能有人请帮助我理解这个问题是什么吗？数据集能否当前不处理原始数组，还是我需要做些额外的工作才能使它们工作？

谢谢

编辑1：

以下是完整的错误我得到

Error:(223, 84) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. 
    val similarities = invertedIndexDataset.groupByKey(x => x.partition).mapGroups {

来源

2016-06-27 Daniel Imberman

你可能想看看[本SO后（http://stackoverflow.com/questions/36449368/using-an-optionsome-primitive-type-in-spark-dataset-api）。什么是你得到的具体错误？ –

你有：import sqlContext.implicits._ –

@RobertHorvick是的，但是我把它导入到函数中，所以会导致一个问题？（这个函数接受sparkContext作为参数，所以很难有一个类级的sqlContext） –

下按预期的星火2.0。

import spark.implicits._ 

spark.createDataset(Array(1,2) :: Array(1) :: Array(2) :: Nil) 
res0:org.apache.spark.sql.Dataset[Array[Int]] = [value: array<int>]

来源

2017-04-29 22:05:16 marios

Spark中数组的数据集（1.6.1）

回答

相关问题