2012-04-24 120 views
1

我想解析具有嵌套标签集合的xml文件。我尝试使用perl XML ::简单的API来解析并且单个标签值被完全解析,但无法解析嵌套标签值。在perl中使用嵌套标签解析

<archetype> 
    <original_language></original_language> 
    <description></description> 
    <archetype_id> 
    <definition></definition> 
    <ontology></ontology> 
</archetype> 
在定义部分

包含该项目的详细信息

例如

<definition> 
. 
. 
<node_id>at0004</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
<rm_attribute_name>value</rm_attribute_name> 
+<existence> </existence> 
<children xsi:type="C_DV_QUANTITY"> 
    <rm_type_name>DV_QUANTITY</rm_type_name> 
    +<occurrences></occurrences> 
    <node_id/> 
    +<property></property> 
    <list> 
    <magnitude> 
     <lower_included>true</lower_included> 
     <upper_included>false</upper_included> 
     <lower_unbounded>false</lower_unbounded> 
     <upper_unbounded>false</upper_unbounded> 
     <lower>0.0</lower> 
     <upper>1000.0</upper> 
</magnitude> 
<units>mm[Hg]</units> 
</list> 
</children> 
</attributes> 
. 
. 
</definition> 

从上面的例子文件格式我想喜欢

node_id - > at0004 
    magnitude -> lower -> 0.0 
    magnitude -> higher -> 1000.0 

请指导的内容过滤器我过滤内容。

+0

如果您包含当前的代码,它可能会很有用。这样我们就可以指出你出错的地方,而不仅仅是给你完整的答案。 – 2012-04-24 10:36:14

回答

2

您需要了解有关参考文献:perlreftutperlref,perldsc

use strictures; 
use XML::Simple qw(:strict); 

my $root = XMLin(<<'XML', ForceArray => 0, KeyAttr => undef); 
<definition> 
. 
. 
<node_id>at0004</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
<rm_attribute_name>value</rm_attribute_name> 
+<existence> </existence> 
<children xsi:type="C_DV_QUANTITY"> 
    <rm_type_name>DV_QUANTITY</rm_type_name> 
    +<occurrences></occurrences> 
    <node_id/> 
    +<property></property> 
    <list> 
    <magnitude> 
     <lower_included>true</lower_included> 
     <upper_included>false</upper_included> 
     <lower_unbounded>false</lower_unbounded> 
     <upper_unbounded>false</upper_unbounded> 
     <lower>0.0</lower> 
     <upper>1000.0</upper> 
</magnitude> 
<units>mm[Hg]</units> 
</list> 
</children> 
</attributes> 
. 
. 
</definition> 
XML 

my $m = $root->{attributes}{children}{list}{magnitude}; 
printf <<'TEMPLATE', $root->{node_id}, $m->{lower}, $m->{upper}; 
node_id -> %s 
    magnitude -> lower -> %.1f 
    magnitude -> higher -> %.1f 
TEMPLATE 

use Data::Dump::Streamer qw(Dump); Dump $root; 

输出:

node_id -> at0004 
    magnitude -> lower -> 0.0 
    magnitude -> higher -> 1000.0 

$HASH1 = { 
    attributes => { 
     children => { 
      content => [("\n +") x 2], 
      list => { 
       magnitude => { 
        lower   => '0.0', 
        lower_included => 'true', 
        lower_unbounded => 'false', 
        upper   => '1000.0', 
        upper_included => 'false', 
        upper_unbounded => 'false' 
       }, 
       units => 'mm[Hg]' 
      }, 
      node_id  => {}, 
      occurrences => {}, 
      property  => {}, 
      rm_type_name => 'DV_QUANTITY', 
      "xsi:type" => 'C_DV_QUANTITY' 
     }, 
     content   => "\n+", 
     existence   => {}, 
     rm_attribute_name => 'value', 
     "xsi:type"  => 'C_SINGLE_ATTRIBUTE' 
    }, 
    content => [("\n.\n.\n") x 2], 
    node_id => 'at0004' 
}; 
1

这里是一个XML::Twig程序,可以做到这一点,虽然我做了一些假设,你可能需要调整。我不知道如果<defintions>可以有多个节点属性对,所以我写这个来处理多对:

#!/Users/brian/bin/perls/perl5.14.2 

use XML::Twig; 
use Data::Dumper; 

my $twig = XML::Twig->new(
    twig_handlers => { 
     magnitude => sub { 
      my $m = $_; 
      my $hash = $m->simplify; 
      my $node_id = $m->parent('attributes')->prev_sibling('node_id')->text; 
      print "node -> $node_id\n", 
       "\tmagnitude -> lower -> $hash->{lower} $units\n", 
       "\tmagnitude -> higher -> $hash->{upper} $units\n"; 
      }, 
     }, 
    ); 

$twig->parse(*DATA); 


__END__ 
<definition> 

<node_id>at0004</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
    <rm_attribute_name>value</rm_attribute_name> 
    <existence> </existence> 
    <children xsi:type="C_DV_QUANTITY"> 
     <rm_type_name>DV_QUANTITY</rm_type_name> 
     <occurrences></occurrences> 
     <node_id/> 
     <property></property> 
     <list> 
      <magnitude> 
       <lower_included>true</lower_included> 
       <upper_included>false</upper_included> 
       <lower_unbounded>false</lower_unbounded> 
       <upper_unbounded>false</upper_unbounded> 
       <lower>0.0</lower> 
       <upper>1000.0</upper> 
      </magnitude> 
      <units>mm[Hg]</units> 
     </list> 
    </children> 
</attributes> 

<node_id>at0005</node_id> 
<attributes xsi:type="C_SINGLE_ATTRIBUTE"> 
    <rm_attribute_name>value</rm_attribute_name> 
    <existence> </existence> 
    <children xsi:type="C_DV_QUANTITY"> 
     <rm_type_name>DV_QUANTITY</rm_type_name> 
     <occurrences></occurrences> 
     <node_id/> 
     <property></property> 
     <list> 
      <magnitude> 
       <lower_included>true</lower_included> 
       <upper_included>false</upper_included> 
       <lower_unbounded>false</lower_unbounded> 
       <upper_unbounded>false</upper_unbounded> 
       <lower>100.9</lower> 
       <upper>998.7</upper> 
      </magnitude> 
      <units>mm[Hg]</units> 
     </list> 
    </children> 
</attributes> 

</definition>