2017-08-29 16 views
1

星火/斯卡拉ReduceByKey创建嵌套结构。我可以使用groupBy函数来执行此操作,但对于大数据来说效果不佳。所以我想用reduceByKey来做,但我无法得到我想要的。任何帮助,将不胜感激。星火/斯卡拉:使用采用RDD只</p> <p>我想创建嵌套结构只使用RDD ReduceByKey创建嵌套结构:采用RDD使用仅

输入数据:

val sales=sc.parallelize(List(
    ("West", "Apple", 2.0, 10), 
    ("West", "Apple", 3.0, 15), 
    ("West", "Orange", 5.0, 15), 
    ("South", "Orange", 3.0, 9), 
    ("South", "Orange", 6.0, 18), 
    ("East", "Milk", 5.0, 5))) 

所需的输出是结构体列表。我能够做到这groupByKey使用象下面这样:

sales.map(value => (value._1 ,(value._2,value._3,value._4 ))) 
    .groupBy(_._1) 
    .map { case(k,v) => (k, v.map(_._2)) } 
    .collect() 
    .foreach(println) 

// (South,List((Orange,3.0,9), (Orange,6.0,18))) 
// (East,List((Milk,5.0,5))) 
// (West,List((Apple,2.0,10), (Apple,3.0,15), (Orange,5.0,15))) 

但我想要实现使用reduceByKey同样的事情。我无法获得List [Struct]。相反,我可以得到List [List]。有什么办法可以获得List [Struct]?

sales.map(value => (value._1 ,List(value._2,value._3,value._4))) 
    .reduceByKey((a,b) => (a ++ b)) 
    .collect() 
    .foreach(println) 

// (South,List(Orange, 3.0, 9, Orange, 6.0, 18)) 
// (East,List(Milk, 5.0, 5)) 
// (West,List(Apple, 2.0, 10, Apple, 3.0, 15, Orange, 5.0, 15)) 

sales.map(value => (value._1 ,List(value._2,value._3,value._4))) 
    .reduceByKey((a,b) =>(List(a) ++ List(b))) 
    .collect() 
    .foreach(println) 

// (South,List(List(Orange, 3.0, 9), List(Orange, 6.0, 18))) 
// (East,List(Milk, 5.0, 5)) 
// (West,List(List(List(Apple, 2.0, 10), List(Apple, 3.0, 15)), List(Orange, 5.0, 15))) 

回答