无法让Spark运行在Intellij Idea的Scala工作表中

如果我将它放在扩展App特征并使用Idea的run命令来运行它的对象中，那么以下代码将毫无问题地运行。无法让Spark运行在Intellij Idea的Scala工作表中

然而，当我尝试从工作表中运行它，我遇到这些情况之一：

1-如果第一行是存在，我得到：

任务不能序列：JAVA。 io.NotSerializableException：A $ A34 $ A $ A34

2 - 如果第一行注释掉，我得到：

无法生成内部类A $ A35 $ A $ A35 $ A12没有访问范围，这个类中定义。

//First line! 
org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this) 

import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.types.{IntegerType, StructField, StructType} 

case class AClass(id: Int, f1: Int, f2: Int) 
val spark = SparkSession.builder() 
    .master("local[*]") 
    .appName("Test App") 
    .getOrCreate() 
import spark.implicits._ 

val schema = StructType(Array(
    StructField("id", IntegerType), 
    StructField("f1", IntegerType), 
    StructField("f2", IntegerType))) 

val df = spark.read.schema(schema) 
    .option("header", "true") 
    .csv("dataset.csv") 

// Displays the content of the DataFrame to stdout 
df.show() 
val ads = df.as[AClass] 

//This is the line that causes serialization error 
ads.foreach(x => println(x))

该项目已使用IDEA的斯卡拉插件创建一个编码器，这是我的build.sbt：

... 
    scalaVersion := "2.10.6" 
    scalacOptions += "-unchecked" 
    libraryDependencies ++= Seq(
     "org.apache.spark" % "spark-core_2.10" % "2.1.0", 
     "org.apache.spark" % "spark-sql_2.10" % "2.1.0", 
     "org.apache.spark" % "spark-mllib_2.10" % "2.1.0" 
     )

我试过this答案的解决方案。但它不适用于我正在使用的Idea Ultimate 2017.1，而且当我使用工作表时，如果可能的话，我宁愿不向工作表添加额外的对象。

如果我在数据集对象上使用collect()方法并获得一个“Aclass”实例数组，则不会有更多的错误。它试图直接与DS一起工作导致错误。

来源

2017-04-05 jrook

使用Eclipse兼容模式（开启首选项 - >型斯卡拉 - 在语言&框架>，选择斯卡拉 - >选择工作表 - >只能选择Eclipse的兼容模式）看到https://gist.github.com/RAbraham/585939e5390d46a7d6f8

来源

2017-04-05 07:56:45

我希望能够使用SparkSession对象。检查Eclipse兼容模式也没有解决问题。 – jrook

无法让Spark运行在Intellij Idea的Scala工作表中

回答

相关问题