你念了HDFS源并在rdd
数据后,你可以尝试类似如下:
import org.apache.spark.rdd.RDD
import org.apache.spark.graphx.Edge
// Sample data
val rdd = sc.parallelize(Seq("1: 1, 2, 3", "2: 2, 3"))
val edges: RDD[Edge[Int]] = rdd.flatMap {
row =>
// split around ":"
val splitted = row.split(":").map(_.trim)
// the value to the left of ":" is the source vertex:
val srcVertex = splitted(0).toLong
// for the values to the right of ":", we split around "," to get the other vertices
val otherVertices = splitted(1).split(",").map(_.trim)
// for each vertex to the right of ":", we create an Edge object connecting them to the srcVertex:
otherVertices.map(v => Edge(srcVertex, v.toLong, 1))
}
编辑
此外,如果你的顶点具有恒定的缺省权重,您可以直接从边缘创建图形,因此不需要创建verticesRDD:
import org.apache.spark.graphx.Graph
val g = Graph.fromEdges(edges, defaultValue = 1)
th为您提供所有帮助!我遵循你所说的,并能够创建一个val图,只是试图找到一种方法来看看它是否工作! –
我试着按照你说的方式去做,只有那些没用的东西是RDD [Edge [Int],所以我只用了RDD。但不断收到以下错误::43:error:not found:value Edge otherVertices.map(v => Edge(srcVertex,v.toLong,1)) ^ :43:error:type mismatch; found:Array [Nothing] required:TraversableOnce [?] otherVertices.map(v => Edge(srcVertex,v.toLong,1)) –
您是否导入了Edge类? 'import org.apache.spark.graphx.Edge'。这可能是问题所在,也是为什么'RDD [边缘[Int]]'不起作用 –