2016-12-14 71 views
1

我想学习火花数据集(spark 2.0.1)。在左外部连接之下创建空指针异常。空指针异常 - Apache Spark数据集左外连接

case class Employee(name: String, age: Int, departmentId: Int, salary: Double) 
case class Department(id: Int, depname: String) 
case class Record(name: String, age: Int, salary: Double, departmentId: Int, departmentName: String) 
val employeeDataSet = sc.parallelize(Seq(Employee("Jax", 22, 5, 100000.0),Employee("Max", 22, 1, 100000.0))).toDS() 
val departmentDataSet = sc.parallelize(Seq(Department(1, "Engineering"), Department(2, "Marketing"))).toDS() 

val averageSalaryDataset = employeeDataset.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer") 
           .map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , record._2.depname)) 

averageSalaryDataset.show() 

16/12/14 16时48分26秒ERROR执行人:异常在任务0.0在阶段2.0(TID 12) 显示java.lang.NullPointerException

这是因为,在做左外加入它为record._2.depname提供空值。

如何处理?由于

回答

0

使用解决了这个---

val averageSalaryDataset1 = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer").selectExpr("nvl(_1.name, ' ') as name","nvl(_1.age, 0) as age","nvl(_1.salary, 0.0D) as salary","nvl(_1.departmentId, 0) as departmentId","nvl(_2.depname, ' ') as departmentName").as[Record] 
averageSalaryDataset1.show() 
+0

虽然这可能会工作,它是一个非常贫穷的解决方案:O!我不明白为什么加入并不回馈的选项案例分类很容易检查。 – Sparky

0

空可使用的if..else条件进行处理。

val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer").map(record => Record(record._1.name, record._1.age, record._1.salary, record._1.departmentId , if (record._2 == null) null else record._2.depname)) 

连接操作后,产生的数据集列存储的映射(键值对),并在地图操作,我们呼吁的钥匙,但是当您呼叫记录中的关键是“空”。 _2.depName这就是为什么例外

val averageSalaryDataset = employeeDataSet.joinWith(departmentDataSet, $"departmentId" === $"id", "left_outer") 

Dataset after left join