1
星火/斯卡拉ReduceByKey创建嵌套结构。我可以使用groupBy函数来执行此操作,但对于大数据来说效果不佳。所以我想用reduceByKey来做,但我无法得到我想要的。任何帮助,将不胜感激。星火/斯卡拉:使用采用RDD只</p> <p>我想创建嵌套结构只使用RDD ReduceByKey创建嵌套结构:采用RDD使用仅
输入数据:
val sales=sc.parallelize(List(
("West", "Apple", 2.0, 10),
("West", "Apple", 3.0, 15),
("West", "Orange", 5.0, 15),
("South", "Orange", 3.0, 9),
("South", "Orange", 6.0, 18),
("East", "Milk", 5.0, 5)))
所需的输出是结构体列表。我能够做到这groupByKey
使用象下面这样:
sales.map(value => (value._1 ,(value._2,value._3,value._4 )))
.groupBy(_._1)
.map { case(k,v) => (k, v.map(_._2)) }
.collect()
.foreach(println)
// (South,List((Orange,3.0,9), (Orange,6.0,18)))
// (East,List((Milk,5.0,5)))
// (West,List((Apple,2.0,10), (Apple,3.0,15), (Orange,5.0,15)))
但我想要实现使用reduceByKey
同样的事情。我无法获得List [Struct]。相反,我可以得到List [List]。有什么办法可以获得List [Struct]?
sales.map(value => (value._1 ,List(value._2,value._3,value._4)))
.reduceByKey((a,b) => (a ++ b))
.collect()
.foreach(println)
// (South,List(Orange, 3.0, 9, Orange, 6.0, 18))
// (East,List(Milk, 5.0, 5))
// (West,List(Apple, 2.0, 10, Apple, 3.0, 15, Orange, 5.0, 15))
sales.map(value => (value._1 ,List(value._2,value._3,value._4)))
.reduceByKey((a,b) =>(List(a) ++ List(b)))
.collect()
.foreach(println)
// (South,List(List(Orange, 3.0, 9), List(Orange, 6.0, 18)))
// (East,List(Milk, 5.0, 5))
// (West,List(List(List(Apple, 2.0, 10), List(Apple, 3.0, 15)), List(Orange, 5.0, 15)))