黑尔的一种方法。首先,设置了例如:
val prefix = "/home/tmp/date="
val dates = Array("20140901", "20140902", "20140903", "20140904")
val datesRDD = sc.parallelize(dates, 2)
荏苒的前缀中容易:
val datesWithPrefixRDD = datesRDD.map(s => prefix + s)
datesWithPrefixRDD.foreach(println)
这将产生:
/home/tmp/date=20140901
/home/tmp/date=20140903
/home/tmp/date=20140902
/home/tmp/date=20140904
但是,你问一个字符串。最明显的第一次尝试有一定的逗号问题:
val bad = datesWithPrefixRDD.fold("")((s1, s2) => s1 + ", " + s2)
println(bad)
这将产生:
, , /home/tmp/date=20140901, /home/tmp/date=20140902, , /home/tmp/date=20140903, /home/tmp/date=20140904
的问题是这样的星火RDD的倍()方法启动级联用我提供的空字符串,曾经为整个RDD和每个分区一次。但是,我们可以处理空字符串:
val good = datesWithPrefixRDD.fold("")((s1, s2) =>
s1 match {
case "" => s2
case s => s + ", " + s2
})
println(good)
然后我们得到:
/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904
编辑:其实,降低()产生一个整洁的答案,因为它解决了“额外的逗号”的问题:
val alternative = datesWithPrefixRDD.reduce((s1, s2) => s1 + ", " + s2)
println(alternative)
我们再次得到:
/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904
它的工作原理,非常感谢! – 2014-09-27 21:11:32