2017-02-20 68 views
0

当我申请ParDo.of(new ParDoFn())PCollection名为textInput时,程序会抛出此异常。但是当我删除.apply(ParDo.of(new ParDoFn()))时,程序正常。AssertionError:声明失败:copyAndReset必须返回一个零值副本

// SparkRunner

private static void testHadoop(Pipeline pipeline){ 
    Class<? extends FileInputFormat<LongWritable, Text>> inputFormatClass = 
      (Class<? extends FileInputFormat<LongWritable, Text>>) 
        (Class<?>) TextInputFormat.class; 
    @SuppressWarnings("unchecked") //hdfs://localhost:9000 
      HadoopIO.Read.Bound<LongWritable, Text> readPTransfom_1 = HadoopIO.Read.from("hdfs://localhost:9000/tmp/kinglear.txt", 
      inputFormatClass, 
      LongWritable.class, 
      Text.class); 
    PCollection<KV<LongWritable, Text>> textInput = pipeline.apply(readPTransfom_1) 
      .setCoder(KvCoder.of(WritableCoder.of(LongWritable.class), WritableCoder.of(Text.class))); 

    //OutputFormat 
    @SuppressWarnings("unchecked") 
    Class<? extends FileOutputFormat<LongWritable, Text>> outputFormatClass = 
      (Class<? extends FileOutputFormat<LongWritable, Text>>) 
        (Class<?>) TemplatedTextOutputFormat.class; 

    @SuppressWarnings("unchecked") 
    HadoopIO.Write.Bound<LongWritable, Text> writePTransform = HadoopIO.Write.to("hdfs://localhost:9000/tmp/output", outputFormatClass, LongWritable.class, Text.class); 

    textInput.apply(ParDo.of(new ParDoFn())).apply(writePTransform.withoutSharding()); 

    pipeline.run().waitUntilFinish(); 

} 
+0

你能否在你的问题中包含完整的异常堆栈跟踪?这有助于缩小问题的范围。另外,您可能想尝试遵循Apache Beam示例中的样式 - 您构建的变换仅使用一次;你可能想要内联他们,你的代码将更具可读性。 –

回答

3

哪个版本的Spark你跑在最前面?根据我的经验,你得到的错误是由Spark 2.x AccumulatorV2引发的,Spark runner目前支持Spark 1.6。

+0

你是对的! – zifanpan

+0

我已经解决了Spark 1.6的问题。 – zifanpan

+0

@zifanpan你能解释一下你是如何解决这个问题的。我有你所建议的依赖版本,即1.6.3,我无法解决这个问题。请建议 – Abhishek

1

我在创建自定义累加器时遇到了类似的问题,该累加器延伸至org.apache.spark.util.AccumulatorV2。原因是override def isZero: Boolean方法中的逻辑不正确。所以基本上当你copyAndReset方法默认调用时,它调用copy()然后reset()你的isZero()应该返回true。 如果你看一下AccumulatorV2来源,是其中一个检查:

// Called by Java when serializing an object 
final protected def writeReplace(): Any = { 
if (atDriverSide) { 
    if (!isRegistered) { 
    throw new UnsupportedOperationException(
     "Accumulator must be registered before send to executor") 
    } 
    val copyAcc = copyAndReset() 
    assert(copyAcc.isZero, "copyAndReset must return a zero value copy") 
    copyAcc.metadata = metadata 
    copyAcc 
} else { 
    this 
} 
} 

明确这部分

val copyAcc = copyAndReset() 
assert(copyAcc.isZero, "copyAndReset must return a zero value copy") 

希望它能帮助。快乐的火花!