1
我有下面的程序来计算日志文件中的“错误”的计数。最后,它的值被打印在控制台中。当纱线客户端程序运行时,它将在控制台中显示累加器正确值509,但当它在纱线簇模式下运行时,不会显示此值。如何以纱线丛集模式打印?如何在纱线丛集模式下打印累加器?
object ErrorLogsCount{
def main(args:Array[String]){
val sc = new SparkContext();
val logsRDD = sc.textFile(args(0),4)
val errorsAcc = sc.accumulator(0,"Errors Accumulator")
val errorsLogRDD = logsRDD.filter(x => x.contains("ERROR"))
errorsLogRDD.persist()
errorsLogRDD.foreach(x => errorsAcc += 1)
errorsLogRDD.collect()
//printing accumulator
println(errorsAcc.name+" = "+errorsAcc)
//Saving results in HDFS
errorsLogRDD.coalesce(1).saveAsTextFile(args(1))
}
}
试图在HDP沙盒2.4运行(火花1.6.0)