2012-05-17 17 views


| top node | middle node | bottom node | 
| a  |  1  | "name1" | 
| b  |  1  | "name6" | 
| a  |  2  | "name3" | 
| b  |  2  | "name8" | 
| b  |  1  | "name5" | 
| a  |  1  | "name2" | 
| b  |  2  | "name7" | 
| a  |  2  | "name4" | 


<node id = a label = "top node"> 
    <node id = 1 label = "middle node"> 
    <node id = name1 label = "bottom node"/> 
    <node id = name2 label = "bottom node"/> 
    <node id = 2 label = "middle node">  
    <node id = name3 label = "bottom node"/> 
    <node id = name4 label = "bottom node"/> 
<node id = b label = "top node"> 
    <node id = 1 label = "middle node"> 
    <node id = name5 label = "bottom node"/> 
    <node id = name6 label = "bottom node"/> 
    <node id = 2 label = "middle node">  
    <node id = name7 label = "bottom node"/> 
    <node id = name8 label = "bottom node"/> 



| b  |  1  | "name6" | 



这可能与Nokogiri。你有什么尝试? –




require 'nokogiri' 

# Create an array of the top/middle/bottom node ids 
rows = File.readlines('my.data')[1..-1].map{ |row| row.scan(/[^|\s"]+/) } 

# Look underneath a parent node for another node with a specific id 
# If you can't find one, create one (with the label) and return it. 
def find_or_create_on(parent,id,label) 
    parent.at("node[id='#{id}']") or 
    parent.add_child("<node id='#{id}' label='#{label}' />")[0] 

# Since an XML document can only ever have one root node, 
# and your data can have many, let's wrap them all in a new document 
root = Nokogiri.XML('<root></root>').root 

# For each triplet, find or create the nodes you need, in order 
# (When iterating an array of arrays, you can automagically convert 
# each item in the sub-array to a named variable.) 
rows.each do |top_id, mid_id, bot_id| 
    top = find_or_create_on(root, top_id, 'top node' ) 
    mid = find_or_create_on(top, mid_id, 'middle node') 
    bot = find_or_create_on(mid, bot_id, 'bottom node') 

puts root 
#=> <root> 
#=> <node id="a" label="top node"> 
#=>  <node id="1" label="middle node"> 
#=>  <node id="name1" label="bottom node"/> 
#=>  <node id="name2" label="bottom node"/> 
#=>  </node> 
#=>  <node id="2" label="middle node"> 
#=>  <node id="name3" label="bottom node"/> 
#=>  <node id="name4" label="bottom node"/> 
#=>  </node> 
#=> </node> 
#=> <node id="b" label="top node"> 
#=>  <node id="1" label="middle node"> 
#=>  <node id="name6" label="bottom node"/> 
#=>  <node id="name5" label="bottom node"/> 
#=>  </node> 
#=>  <node id="2" label="middle node"> 
#=>  <node id="name8" label="bottom node"/> 
#=>  <node id="name7" label="bottom node"/> 
#=>  </node> 
#=> </node> 
#=> </root> 



rows.sort.each do |top_id,mid_id,bot_id| 

啊,谢谢!这是有道理的。我使用“id”的原因是这是gexf格式的工作原理。每个节点都有属性“id”,它应该是唯一的标识符。我的问题中的标识符是示例。在真实情况下,我确定它们是独一无二的。而我的源数据被这样格式化的原因是为了表明它没有任何顺序。 – hriundel


我的荣幸;我希望它有帮助。 – Phrogz