将Scala列表转换为DataFrame或DataSet

我是Scala的新手。我试图将一个scala列表（它保存源数据框上的一些计算数据的结果）转换为Dataframe或Dataset。我没有找到任何直接的方法来做到这一点。但是，我已经尝试了以下过程将我的列表转换为DataSet，但它似乎无法正常工作。我正在提供以下三种情况。将Scala列表转换为DataFrame或DataSet

有人可以给我提供一些希望，如何做这种转换？谢谢。

import org.apache.spark.sql.{DataFrame, Row, SQLContext, DataFrameReader} 
import java.sql.{Connection, DriverManager, ResultSet, Timestamp} 
import scala.collection._ 

case class TestPerson(name: String, age: Long, salary: Double) 
var tom = new TestPerson("Tom Hanks",37,35.5) 
var sam = new TestPerson("Sam Smith",40,40.5) 

val PersonList = mutable.MutableList[TestPerson]() 

//Adding data in list 
PersonList += tom 
PersonList += sam 

//Situation 1: Trying to create dataset from List of objects:- Result:Error 
//Throwing error 
var personDS = Seq(PersonList).toDS() 
/* 
ERROR: 
error: Unable to find encoder for type stored in a Dataset. Primitive types 
    (Int, String, etc) and Product types (case classes) are supported by  
importing sqlContext.implicits._ Support for serializing other types will 
be added in future releases. 
    var personDS = Seq(PersonList).toDS() 

*/ 
//Situation 2: Trying to add data 1-by-1 :- Result: not working as desired.  
the last record overwriting any existing data in the DS 
var personDS = Seq(tom).toDS() 
personDS = Seq(sam).toDS() 

personDS += sam //not working. throwing error 


//Situation 3: Working. However, I am having consolidated data in the list  
which I want to convert to DS; if I loop the results of the list in comma 
separated values and then pass that here, it will work but will create an 
extra loop in the code, which I want to avoid. 
var personDS = Seq(tom,sam).toDS() 
scala> personDS.show() 
+---------+---+------+ 
|  name|age|salary| 
+---------+---+------+ 
|Tom Hanks| 37| 35.5| 
|Sam Smith| 40| 40.5| 
+---------+---+------+

来源

2016-09-08 Leo

什么是你的火花和斯卡拉版本？ –

Spark版本为1.6.1 – Leo

尝试没有Seq：

case class TestPerson(name: String, age: Long, salary: Double) 
val tom = TestPerson("Tom Hanks",37,35.5) 
val sam = TestPerson("Sam Smith",40,40.5) 
val PersonList = mutable.MutableList[TestPerson]() 
PersonList += tom 
PersonList += sam 

val personDS = PersonList.toDS() 
println(personDS.getClass) 
personDS.show() 

val personDF = PersonList.toDF() 
println(personDF.getClass) 
personDF.show() 
personDF.select("name", "age").show()

输出：

class org.apache.spark.sql.Dataset 

+---------+---+------+ 
|  name|age|salary| 
+---------+---+------+ 
|Tom Hanks| 37| 35.5| 
|Sam Smith| 40| 40.5| 
+---------+---+------+ 

class org.apache.spark.sql.DataFrame 

+---------+---+------+ 
|  name|age|salary| 
+---------+---+------+ 
|Tom Hanks| 37| 35.5| 
|Sam Smith| 40| 40.5| 
+---------+---+------+ 

+---------+---+ 
|  name|age| 
+---------+---+ 
|Tom Hanks| 37| 
|Sam Smith| 40| 
+---------+---+

此外，确保移动的情况下类TestPersonoutside the scope of your object的声明。

来源

2016-09-08 21:00:14

感谢上述解决方案，它适用于Dataset。我的最终目标是在DataFrame中获取数据。我用这个命令“scala> val RowsDF = sc.parallelize（personDS）.toDF（）”但是出现错误“：51：error：type mismatch; found：org.apache.spark.sql.Dataset [TestPerson] 需要：Seq [？] val RowsDF = sc.parallelize（personDS）.toDF（） “ – Leo

我得到这个：scala> val RowsDF = personDS.toDF（） RowsDF：org.apache.spark.sql.DataFrame = [名称：字符串，年龄：bigint，工资：双倍] – Leo

将Scala列表转换为DataFrame或DataSet

回答

相关问题