2015-08-08 198 views
0

我有一个RDD RDD1与下面的模式:火花斯卡拉RDD

RDD[String, Array[String]] 

(姑且称之为RDD1

,我想,每行创建一个新的RDD RDD2作为RDD[String,String]与键和值属于RDD1

例如:

RDD1 =Array(("Fruit",("Orange","Apple","Peach")),("Shape",("Square","Rectangle")),("Mathematician",("Aryabhatt")))) 

我想要的输出为如下:

RDD2 = Array(("Fruit","Orange"),("Fruit","Apple"),("Fruit","Peach"),("Shape","Square"),("Shape","Rectangle"),("Mathematician","Aryabhatt")) 

有人可以帮我这段代码?

我尝试:

val R1 = RDD1.map(line => (line._1,line._2.split((",")))) 
val R2 = R1.map(line => line._2.foreach(ph => ph.map(line._1))) 

这给了我一个错误:

error: value map is not a member of Char

我明白,这是因为地图功能仅适用于RDDs,而不是每个string/char。请帮助我在Spark中使用嵌套函数。

回答

4

分解问题。

  1. ("Fruit",Array("Orange","Apple","Peach") - >Array(("Fruit", "Orange"), ("Fruit", "Apple"), ("Fruit", "Peach"))

def flattenLine(line: (String, Array[String])) = line._2.map(x => (line._1, x)

  • 应用该函数到您的RDD:
  • rdd1.flatMap(flattenLine)

    +0

    非常感谢:)做了工作:) –