2017-07-18 24 views
0

以下代码引发了NullPointerException。即使有Option(x._1.F2).isDefined && Option(x._2.F2).isDefined来防止空值?如何在scala.math.BigDecimal上检查null?

case class Cols (F1: String, F2: BigDecimal, F3: Int, F4: Date, ...) 

def readTable() : DataSet[Cols] = { 
    import sqlContext.sparkSession.implicits._ 

    sqlContext.read.format("jdbc").options(Map(
     "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver", 
     "url" -> jdbcSqlConn, 
     "dbtable" -> s"..." 
    )).load() 
     .select("F1", "F2", "F3", "F4") 
     .as[Cols] 
    } 

import org.apache.spark.sql.{functions => func} 
val j = readTable().joinWith(readTable(), func.lit(true)) 
readTable().filter(x => 
    (if (Option(x._1.F2).isDefined && Option(x._2.F2).isDefined 
     && (x._1.F2- x._2.F2< 1)) 1 else 0) //line 51 
    + ..... > 100) 

我试过!(x._1.F2== null || x._2.F2== null)它仍然得到异常。

唯一的例外是

 
java.lang.NullPointerException 
     at scala.math.BigDecimal.$minus(BigDecimal.scala:563) 
     at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:51) 
     at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:44) 
     at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) 
     at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) 
     at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) 
     at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) 
     at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) 
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 
     at org.apache.spark.scheduler.Task.run(Task.scala:108) 
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
     at java.lang.Thread.run(Unknown Source) 

更新: 我尝试下面的表达式和执行仍然打线x._1.F2- x._2.F2。这是一种检查BigDecimal是否为空的方法吗?

(if (!(Option(x._1.F2).isDefined && Option(x._2.F2).isDefined 
     && x._1.F2!= null && x._2.F2!= null)) 0 
     else (if (x._1.F2- x._2.F2< 1) 1 else 0)) 

更新2

后,我裹负进(math.abs((l.F2 - r.F2).toDouble)异常不见了。 为什么?

回答

0

尝试添加该给你的if声明:

&& (x._1.F2 && x._2.F2) != null

我已经在Java中的类似问题,这就是一直为我工作。

+1

它得到'value &&的编译器错误不是Option [BigDecimal]'的成员。 '&&'可以应用于'Option(...)'吗? – ca9163d9

+0

nopes,它不能 – pedrofurla

+1

尝试过'(x._1.F2 == null || x._2.F2 == null)',它仍然得到异常。 – ca9163d9

0

看着为BigDecimal的源代码,在网上563: https://github.com/scala/scala/blob/v2.11.8/src/library/scala/math/BigDecimal.scala#L563

它可能是x._1.F2.bigDecimalx._2.F2.bigDecimalnull,但我真的不知道怎么会发生,给出了构造检查那。但也许在那里检查null,看看是否解决了这个问题?

顺便说一句,你真的应该避免所有的._1._2小号......你应该能够做这样的事情:

val (l: Cols, r: Cols) = x 

要提取的元组值。

+0

奇怪的是我检查了它们是否为空值,如果有空值,则不应该命中该行。 – ca9163d9

+1

猜猜看是什么,在我将负号包裹到'(math.abs((l.F2 - r.F2).toDouble)')后,异常消失了。 – ca9163d9