LibXML - 遍历节点直到

我想使用Perl的XML :: LibXML库解析下面的XML。LibXML - 遍历节点直到

<?xml version="1.0" encoding="UTF-8" ?> 
<TaggedPDF-doc> 
<Part> 
    <Sect> 
    <H4>2.1 Study purpose </H4> 
    <P>This is study purpose content</P> 
    <P>content 1</P> 
    <P>content 2</P> 
    <P>content 3 </P> 
    <P>content 4</P> 
    <P>3. Some Header</P> 
    <P>obj content 4</P> 
    <P>obj content 2</P> 
    </Sect> 
</Part> 
</TaggedPDF-doc>

对于标题研究目的，我试图显示所有相关的兄弟姐妹。所以我的预期成果是：

<H4>2.1 Study purpose </H4> 
<P>This is study purpose content</P> 
<P>content 1</P> 
<P>content 2</P> 
<P>content 3 </P> 
<P>content 4</P>

我的Perl代码如下。我可以显示第一个节点。

给定第一个节点的值，研究目的，有没有一种方法可以循环和打印所有节点，直到我点击包含“数字后跟一个”的节点。

我的Perl实现：

my $purpose_str = 'Purpose and rationale|Study purpose|Study rationale'; 
$parser = XML::LibXML->new; 
#print "Parser for file $file is: $parser \n";  
$dom = $parser->parse_file($file); 

$root = $dom->getDocumentElement; 
$dom->setDocumentElement($root); 

for my $purpose_search('/TaggedPDF-doc/Part/Sect/H4') 
{ 
    $purpose_nodeset = $dom->find($purpose_search); 
    foreach my $purp_node ($purpose_nodeset -> get_nodelist) 
    { 
     if ($purp_node =~ m/$purpose_str/i) 
     { 
      #Get the corresponding child nodes 
      @childnodes = $purp_node->nonBlankChildNodes(); 

      $first_kid = shift @childnodes; 
      $second_kid = $first_kid->nextNonBlankSibling(); 
      #$third_kid = $second_kid->nextNonBlankSibling(); 

      $first_kid -> string_value; 
      $second_kid -> string_value; 
      #$third_kid -> string_value; 
     } 

     print "Study Purpose is: $first_kid\n.$second_kid\n"; 
    } 
}

来源

2013-07-01 BRZ

尝试'使用Data :: Dumper;打印翻车机（@childnodes）;'你设置它后，看看你真的得到 – KeepCalmAndCarryOn

别看子节点，如果你想兄弟姐妹。如果要匹配节点的文本内容，请使用textContent。

#!/usr/bin/perl 
use warnings; 
use strict; 
use XML::LibXML; 

my $file  = 'input.xml'; 
my $purpose_str = 'Purpose and rationale|Study purpose|Study rationale'; 
my $dom   = XML::LibXML->load_xml(location => $file); 

for my $purpose_search('/TaggedPDF-doc/Part/Sect/H4') 
{ 
    my $purpose_nodeset = $dom->find($purpose_search); 
    for my $purp_node ($purpose_nodeset -> get_nodelist) 
    { 
     if ($purp_node->textContent =~ m/$purpose_str/i) 
     { 
      my @siblings = $purp_node->find('following-sibling::*') 
          ->get_nodelist; 

      for my $i (0 .. $#siblings) 
      { 
       if ($siblings[$i]->textContent =~ /^[0-9]+\./) 
       { 
        splice @siblings, $i; 
        last; 
       } 
      } 

      print $_->textContent, "\n" for @siblings; 
     } 

    } 
}

来源

2013-07-01 06:52:37 choroba

非常感谢。这工作。 – BRZ

LibXML - 遍历节点直到

回答

相关问题