2015-12-31 16 views
1

我有一个RDF图(link)与元组(s,p,o),我做了一个属性图。我需要做加入/ joinVertices或在图形中添加字段由Spark Graphx

val propGraph = Graph(vertexArray,edgeArray).cache() 
propGraph.triplets.foreach(println(_)) 

与如下输出:

和RDF数据为:

((0,<http://umkc.edu/xPropGraph#franklin>),(1,http://umkc.edu/xPropGraph#rxin>),<http://umkc.edu/xPropGraph#advisor>) 
((1,<http://umkc.edu/xPropGraph#rxin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#collab>) 
((2147483648,<http://umkc.edu/xPropGraph#peter>),(4294967295,<http://umkc.edu/xPropGraph#John),<http://umkc.edu/xPropGraph#student>) 
((6442450942,<http://umkc.edu/xPropGraph#istoica>),(0,<http://umkc.edu/xPropGraph#franklin>),<http://umkc.edu/xPropGraph#colleague>) 
((0,<http://umkc.edu/xPropGraph#franklin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#pi>) 

当我申请connectedComponents()我的RDF属性图由下面的代码(Complete code)中得到我得到ccccID作为波纹管 -

val cc = propGraph.connectedComponents().cache() 
cc.triplets.foreach(println(_)) 

随着输出:

((0,0),(2,0),<http://umkc.edu/xPropGraph#pi>) 
((0,0),(1,0),<http://umkc.edu/xPropGraph#advisor>) 
((1,0),(2,0),<http://umkc.edu/xPropGraph#collab>) 
((2147483648,2147483648),(4294967295,2147483648),<http://umkc.edu/xPropGraph#student>) 
((6442450942,0),(0,0),<http://umkc.edu/xPropGraph#colleague>) 

我需要得到的东西,如:

((vId_src,src_att),(vId_dst,dst_att),property, ccID) 

即 我需要导致这种三重/图形格式:

((0,<http://umkc.edu/xPropGraph#franklin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#pi>,0) 
((6442450942,<http://umkc.edu/xPropGraph#istoica>),(0,<http://umkc.edu/xPropGraph#franklin>),<http://umkc.edu/xPropGraph#colleague>,0) 
((0,<http://umkc.edu/xPropGraph#franklin>),(1,<http://umkc.edu/xPropGraph#rxin>),<http://umkc.edu/xPropGraph#advisor>,0) 
((1,<http://umkc.edu/xPropGraph#rxin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#collab>,0) 
((2147483648,<http://umkc.edu/xPropGraph#peter>),(4294967295,<http://umkc.edu/xPropGraph#John),<http://umkc.edu/xPropGraph#student>,2147483648) 

所以我的选择可能来自加入。我试图做一些事情,如 val triplets = propGraph.joinVertices(cc.vertices),但无法正确执行。 有什么办法可以得到这个?

任何帮助表示赞赏!我是Graphx的新手。:)

+0

,如果你提供例如图这将是有益的。 (参见例如http://stackoverflow.com/q/34528963/1560062)。目前尚不清楚这里的类型是什么,Scala打印输出不是很有用。 – zero323

+0

@ zero323感谢您的建议。我添加了两个链接。任何帮助表示赞赏! – ChikuMiku

回答

0

我一直在寻找((vId_src,src_att),(vId_dst,dst_att),property, ccID)所以我用zip()两个RDDs。

val cc: Graph[graphx.VertexId,String] = propGraph.connectedComponents().cache() 
    println("###GRAPH WITH CONNECTED COMPONENTS ###") 
    cc.triplets.foreach(println(_)) 
    println("###VERTICES OF CONNECTED COMPONENTS GRAPH ###") 
    cc.vertices.foreach(println(_)) 
    println("###EDGES OF CONNECTED COMPONENTS GRAPH ###") 
    cc.edges.foreach(println(_)) 


/** 
* Alternative way for join operation*/ 
println("###STEP-2 GETTING ONE MERGED RDD OF NEW GRAPH###") 
val newGraph: RDD[String] = propGraph.triplets.map(t =>t.srcId +","+ t.srcAttr+"),"+"("+t.dstId+","+ t.dstAttr+"),"+t.attr) 
val ccID: RDD[String]=cc.triplets.map(t=>t.srcAttr+"") 
val newPropGraph: RDD[(String,String)]= newGraph.zip(ccID) 
newPropGraph.collect.foreach(println(_)) 

这样做后,我得到了以下的输出:

(4294967296,<http://umkc.edu/xPropGraph#node1>),(2147483649,<http://umkc.edu/xPropGraph#node2>),<http://umkc.edu/xPropGraph#prop1>,0) 
(2147483649,<http://umkc.edu/xPropGraph#node2>),(6442450942,<http://umkc.edu/xPropGraph#node4>),<http://umkc.edu/xPropGraph#prop5>,0) 
(4294967295,<http://umkc.edu/xPropGraph#node5>),(2147483648,<http://umkc.edu/xPropGraph#node6>),<http://umkc.edu/xPropGraph#prop3>,2147483648) 
(0,<http://umkc.edu/xPropGraph#node3>),(6442450942,<http://umkc.edu/xPropGraph#node4>),<http://umkc.edu/xPropGraph#prop2>,0) 
(2147483649,<http://umkc.edu/xPropGraph#node2>),(0,<http://umkc.edu/xPropGraph#node3>),<http://umkc.edu/xPropGraph#prop4>,0) 
相关问题