的内容,我尝试打印我的RDD RDD[(String,List[(String,String)])]
的内容:无法打印RDD
val sc = new SparkContext(conf)
val splitted = rdd.map(line => line.split(","))
val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
val grouped = processed.reduceByKey((x,y) => (x ++ y))
System.out.println(grouped)
然而不是获取内容的我看到:
ShuffledRDD[4] at reduceByKey at Consumer.scala:88
UPDATE:
TXT文件的内容:
100001082016,230,111,1,1
100001082016,121,111,1,1
100001082016,110,111,1,1
更新2(整个代码):
class Consumer()
{
def run() = {
val conf = new SparkConf()
.setAppName("TEST")
.setMaster("local[*]")
val sc = new SparkContext(conf)
val rdd = sc.textFile("file:///usr/test/myfile.txt")
val splitted = rdd.map(line => line.split(","))
val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
val grouped = processed.reduceByKey((x,y) => (x ++ y))
System.out.println(grouped)
}
}
stacktrace的其余部分是什么意思? –
而在斯卡拉,你会做'println(grouped.collect())'。无需System.out –
@ cricket_007:在这种情况下,我得到'[Lscala.Tuple2; @ 5377414a'。其余的堆栈是标准的Spark输出,例如'6/08/19 13:49:39 INFO DAGScheduler:作业0完成:收集在Consumer.scala:89,花费0.519500 s'等。 – HackerDuck