2009-11-23 54 views
2

我有一个XML格式的故事集合。我想分析这个文件并将每个故事作为散列或Ruby对象返回,以便我可以在Ruby脚本中进一步处理数据。将XML集合(Pivotal Tracker故事)转换为Ruby散列/对象

请问Nokogiri支持这个,还是有更好的工具/库可以使用?

XML文档具有以下结构,通过Pivotal Tracker's web API返回:

<?xml version="1.0" encoding="UTF-8"?> 
<stories type="array" count="145" total="145"> 
    <story> 
    <id type="integer">16376</id> 
    <story_type>feature</story_type> 
    <url>http://www.pivotaltracker.com/story/show/16376</url> 
    <estimate type="integer">2</estimate> 
    <current_state>accepted</current_state> 
    <description>A description</description> 
    <name>Receivable index listing will allow selection viewing</name> 
    <requested_by>Tony Superman</requested_by> 
    <owned_by>Tony Superman</owned_by> 
    <created_at type="datetime">2009/11/04 15:49:43 WST</created_at> 
    <accepted_at type="datetime">2009/11/10 11:06:16 WST</accepted_at> 
    <labels>index ui,receivables</labels> 
    </story> 
    <story> 
    <id type="integer">17427</id> 
    <story_type>feature</story_type> 
    <url>http://www.pivotaltracker.com/story/show/17427</url> 
    <estimate type="integer">3</estimate> 
    <current_state>unscheduled</current_state> 
    <description></description> 
    <name>Validations in wizards based on direction</name> 
    <requested_by>Matthew McBoggle</requested_by> 
    <created_at type="datetime">2009/11/17 15:52:06 WST</created_at> 
    </story> 
    <story> 
    <id type="integer">17426</id> 
    <story_type>feature</story_type> 
    <url>http://www.pivotaltracker.com/story/show/17426</url> 
    <estimate type="integer">2</estimate> 
    <current_state>unscheduled</current_state> 
    <description>Manual payment needs a description field.</description> 
    <name>Add description to manual payment</name> 
    <requested_by>Tony Superman</requested_by> 
    <created_at type="datetime">2009/11/17 15:10:41 WST</created_at> 
    <labels>payment process</labels> 
    </story> 
    <story> 
    <id type="integer">17636</id> 
    <story_type>feature</story_type> 
    <url>http://www.pivotaltracker.com/story/show/17636</url> 
    <estimate type="integer">3</estimate> 
    <current_state>unscheduled</current_state> 
    <description>The SMS and email templates needs to be editable by merchants.</description> 
    <name>Notifications are editable by the merchant</name> 
    <requested_by>Matthew McBoggle</requested_by> 
    <created_at type="datetime">2009/11/19 16:44:08 WST</created_at> 
    </story> 
</stories> 

回答

5

您可以利用ActiveSupport中的哈希扩展。然后,您只需要在Nokogiri中解析文档,然后将节点集结果转换​​为散列。此方法将保留属性类型(例如整数,日期,数组)。 (当然,如果你使用Rails你没有要求/包括积极支持或引入nokogiri如果您有它在您的环境。我在此假设一个纯Ruby实现)

require 'rubygems' 
require 'nokogiri' 
require 'activesupport' 

include ActiveSupport::CoreExtensions::Hash 

doc = Nokogiri::XML.parse(File.read('yourdoc.xml')) 
my_hash = doc.search('//story').map{ |e| Hash.from_xml(e.to_xml)['story'] } 

这将产生哈希值的数组(每个故事节点),并保留根据属性的类型,如下所示:

my_hash.first['name'] 
=> "Receivable index listing will allow selection viewing" 

my_hash.first['id'] 
=> 16376 

my_hash.first['id'].class 
=> Fixnum 

my_hash.first['created_at'].class 
=> Time 
1

我想你可以坚持this答案。

更简单的可以找到here

1

这个xml是由Rails的ActiveRecord#to_xml方法生成的。如果你使用rails,你应该可以使用Hash#from_xml来解析它。

+0

我在这个例子中没有使用Rails。 – mlambie 2009-11-23 05:49:51

2

类的一行解决方案将是这样的:

# str_xml contains your xml 
xml = Nokogiri::XML.parse(str_xml) 
xml.search('//story').to_a.map{|node| node.children.inject({}){|a,c| a[c.name] = c.text if c.class == Nokogiri::XML::Element; a}} 

返回散列的数组:

>> xml.search('//story').to_a.map{|node| node.children.inject({}){|a,c| a[c.name] = c.text if c.class == Nokogiri::XML::Element; a}} 
=> [{"id"=>"16376", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/16376", "estimate"=>"2", "current_state"=>"accepted", "description"=>"A description", "name"=>"Receivable index listing will allow selection viewing", "requested_by"=>"Tony Superman", "owned_by"=>"Tony Superman", "created_at"=>"2009/11/04 15:49:43 WST", "accepted_at"=>"2009/11/10 11:06:16 WST", "labels"=>"index ui,receivables"}, {"id"=>"17427", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/17427", "estimate"=>"3", "current_state"=>"unscheduled", "description"=>"", "name"=>"Validations in wizards based on direction", "requested_by"=>"Matthew McBoggle", "created_at"=>"2009/11/17 15:52:06 WST"}, {"id"=>"17426", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/17426", "estimate"=>"2", "current_state"=>"unscheduled", "description"=>"Manual payment needs a description field.", "name"=>"Add description to manual payment", "requested_by"=>"Tony Superman", "created_at"=>"2009/11/17 15:10:41 WST", "labels"=>"payment process"}, {"id"=>"17636", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/17636", "estimate"=>"3", "current_state"=>"unscheduled", "description"=>"The SMS and email templates needs to be editable by merchants.", "name"=>"Notifications are editable by the merchant", "requested_by"=>"Matthew McBoggle", "created_at"=>"2009/11/19 16:44:08 WST"}] 

然而,这忽略所有XML属性,但是你有没有说他们怎么处理它;;)

0

也许一个Ruby接口枢纽API可以更好的解决方案,你的任务,请https://github.com/jsmestad/pivotal-tracker ...然后你可以像Ruby这样的普通对象(来自docs)获取故事:

@a_project = PivotalTracker::Project.find(84739)        
@a_project.stories.all(:label => 'overdue', :story_type => ['bug', 'chore'])