2010-07-01 43 views
2

我是编程新手,所以请耐心等待。我有一个看起来像这样许多XML文档:用Ruby和Nokogiri处理XML文件

文件名:PRIDE_Exp_Complete_Ac_10094.xml.gz

<ExperimentCollection version="2.1"> 
<Experiment> 
    <ExperimentAccession>1015</ExperimentAccession> 
    <Title>Protein complexes in Saccharomyces cerevisiae (GPM06600002310)</Title> 
    <ShortLabel>GPM06600002310</ShortLabel> 
    <Protocol> 
     <ProtocolName>None</ProtocolName> 
    </Protocol> 
    <mzData version="1.05" accessionNumber="1015"> 
     <cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" /> 
     <cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" /> 
     <description> 
      <admin> 
       <sampleName>GPM06600002310</sampleName> 
       <sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3."> 
        <cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" /> 
       </sampleDescription> 
          </admin> 
     </description> 
     <spectrumList count="0" /> 
    </mzData> 
     </Experiment> 

我要拿出在 “标题”, “ProtocolName” 之间的文本,和“SampleName”并保存到与.xml.gz具有相同名称的文本文件中。我至今(基于职位我在这个网站看到)下面的代码,但它似乎没有工作:

require 'rubygems' 
require 'nokogiri' 
doc = Nokogiri::XML(File.open("PRIDE_Exp_Complete_Ac_10094.xml.gz")) 
@ExperimentCollection = doc.css("ExperimentCollection Title").map {|node| node.children.text } 

有人能帮助我吗?

感谢

+2

关于'请删除我'留下您的问题的地方(在我回滚到问题之前),您可以通过使用问题文本下面的链接删除您自己的问题(你应该看到类似于:'edit | close | delete'),如果你想删除它,你可以自由地这样做,因为**你拥有这个问题。我回滚了它,因为它似乎是合法的,值得回答。如果您已经解决了您的问题,请发布您的解决方案。否则,请花时间让人们看到它,并提供帮助。 – 2010-07-01 17:50:15

回答

0

如果你很高兴与REXML,而且也只有一个<Experiment>每个文件,然后像下面应该帮助...(顺便说一下,上面的文字是无效的XML,因为没有结束<ExperimentCollection>标签)

require "rexml/document" 
include REXML 
xml=<<EOD 
<Experiment> 
    <ExperimentAccession>1015</ExperimentAccession> 
    <Title>Protein complexes in Saccharomyces cerevisiae (GPM06600002310)</Title> 
    <ShortLabel>GPM06600002310</ShortLabel> 
    <Protocol> 
     <ProtocolName>None</ProtocolName> 
    </Protocol> 
    <mzData version="1.05" accessionNumber="1015"> 
     <cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" /> 
     <cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" /> 
     <description> 
      <admin> 
       <sampleName>GPM06600002310</sampleName> 
       <sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3."> 
        <cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" /> 
       </sampleDescription> 
          </admin> 
     </description> 
     <spectrumList count="0" /> 
    </mzData> 
     </Experiment> 
EOD 

doc = Document.new xml 
doc.elements["Experiment/Title"].text 
doc.elements["Experiment/Protocol/ProtocolName"].text 
doc.elements["Experiment/mzData/description/admin/sampleName"].text