2015-09-13 107 views
0

以下Iterable可以是大小one,two或(最多)three如何基于另一个排序元组排序元组

org.apache.spark.rdd.RDD[Iterable[(String, String, String, String, Long)]] = MappedRDD[17] at map at <console>:75 

每个元组的第二元件可以具有任何下列值:ABC。每个值都可以出现(最多)一次。

我想要做的是那种基于它们按下列顺序(BAC),然后创建通过连接第三名的元素的字符串。如果缺少相应的tag,则将其与空格连接:``。例如:

这样的:

CompactBuffer((blah,A,val1,blah,blah), (blah,B,val2,blah,blah), (blah,C,val3,blah,blah)) 

应导致:

val2,val1,val3 

这样的:

CompactBuffer((blah,A,val1,blah,blah), (blah,C,val3,blah,blah)) 

应导致:

,val1,val3 

这样的:

CompactBuffer((blah,A,val1,blah,blah), (blah,B,val2,blah,blah)) 

应导致:

val2,val1, 

这样的:

CompactBuffer((blah,B,val2,blah,blah)) 

应导致:

val2,, 

等等等等。

回答

3

在你的情况时ABC最多只能出现一次,你可以在对应的值添加到一个临时的地图,从正确的顺序地图检索值。

如果我们使用getOrElse从映射中获取值,我们可以将空字符串指定为默认值。这样,如果我们的Iterable不包含所有带有ABC的元组,我们仍然会得到正确的结果。

type YourTuple = (String, String, String, String, Long) 
def orderTuples(order: List[String])(iter: Iterable[YourTuple]) = { 
    val orderMap = iter.map { case (_, key, value, _, _) => key -> value }.toMap 
    order.map(s => orderMap.getOrElse(s, "")).mkString(",") 
} 

如下,我们可以使用这个功能:

val a = ("blah","A","val1","blah",1L) 
val b = ("blah","B","val2","blah",2L) 
val c = ("blah","C","val3","blah",3L) 

val order = List("B", "A", "C") 
val bacOrder = orderTuples(order) _ 

bacOrder(Iterable(a, b, c)) // String = val2,val1,val3 
bacOrder(Iterable(a, c))  // String = ,val1,val3 
bacOrder(Iterable(a, b))  // String = val2,val1, 
bacOrder(Iterable(b))  // String = val2,, 
0
def orderTuples(xs: Iterable[(String, String, String, String, String)], 
       order: (String, String, String) = ("B", "A", "C")) = { 

    type T = Iterable[(String, String, String, String, String)] 
    type KV = Iterable[(String, String)] 

    val ord = List(order._1, order._2, order._3) 

    def loop(xs: T, acc: KV, vs: Iterable[String] = ord): KV = xs match { 
    case Nil if vs.isEmpty => acc 
    case Nil => vs.map((_, ",")) ++: acc 
    case x :: xs => loop(xs, List((x._2, x._3)) ++: acc, vs.filterNot(_ == x._2)) 
    } 

    def comp(x: String) = ord.zipWithIndex.toMap.apply(x) 

    loop(xs, Nil).toList.sortBy(x => comp(x._1)).map(_._2).mkString(",") 
}