2012-04-20 85 views
0

我试图使用XML :: SAX修改XHTML文档的某些部分,但是所有尝试都失败了。使用Perl XML :: SAX修改XML文档

这里是我想要做的事:

#!/usr/bin/perl 
package MyHandler; 
use strict; 
use warnings; 

use base qw(XML::SAX::Base); 
use Data::Dumper; 

sub start_element { 
    my $self = shift; 
    my $data = shift; 

    if($data->{LocalName} eq 'span') { 
     $data->{LocalName} = 'naps'; 
    } 

    $self->SUPER::start_element($data); # GOOD (and easy) ! 
    #print Dumper($data); 
} 

1; 

#============================ 
#Main programm 
#============================ 
use strict; 
use warnings; 

use XML::SAX::ParserFactory; 
use XML::SAX::Writer; 

my $out; 

my $o = XML::SAX::Writer->new(Output => \$out); 
my $h = MyHandler->new(Handler => $o); 
my $p = XML::SAX::ParserFactory->parser(Handler => $h); 

my $data; 
{ local undef $/ }; $data = <DATA>; 
$p->parse_string($data); 
print $out; 


__DATA__ 
<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd"> 
<body> 
<wicket:panel> 
    <form wicket:id="mvpForm"> 
     <span>Edit Information: </span> 
     <input type="checkbox" wicket:id="editForm"/> 

     <span>Name: </span> 
     <span wicket:id="name"></span> 
     <input type="text" wicket:id="nameEdit"/> 

     <span>Last Name: </span> 
     <span wicket:id="lastName"></span> 
     <input type="text" wicket:id="lastNameEdit"/> 

     <span>DOB: </span> 
     <span wicket:id="dob"></span> 
     <input type="text" wicket:id="dobEdit"/> 


     <span>Occupation: </span> 
     <span wicket:id="occupation"></span> 
     <input type="text" wicket:id="occupationEdit"/> 


     <span>Gender: </span> 
     <span wicket:id="gender"></span> 
     <span wicket:id="genderEdit"/> 

     <input type="submit" wicket:id="submit"/> 

    </form> 
</wicket:panel> 
</body> 
</html> 

的基本思想是每一个“跨度”更改为“小睡”,写所产生的修改后的XML到stdout。另外,看看它是否可以使用SAX合并xml块,换句话说,如果我找到了一个扩展到其他东西的特定元素,我怎样才能将它与输出合并到一起STDOUT?

E.g. 来源:

<xmltag> 
    <expandable/> 
</xmltag> 

要:

<xmltag> 
    <expanded> 
     This is an expanded element 
    </expanded> 
</xmltag> 

感谢。

回答

1

回答我的关于合并/扩展元素的问题,这里是如何与萨克斯做一个片段:

#!/usr/bin/perl 
package MyHandler; 
use strict; 
use warnings; 

use base qw(XML::SAX::Base); 
use Data::Dumper; 

use XML::SAX::ParserFactory; 
use XML::SAX::Writer; 

sub start_element { 
    my $self = shift; 
    my $data = shift; 

    if($data->{LocalName} eq 'expand') { 
     $self->{in_include}++; 
     my $p = XML::SAX::ParserFactory->parser(Handler => $self); 
     $p->parse_string("<expanded>This is my expanded tag</expanded>"); 
     return; 
    } 

    #$data->{Attributes} = undef; 
    $self->SUPER::start_element($data); 
    #print Dumper($data); 
} 

sub characters { 
    my $self = shift; 
    my $data = shift; 

    #print "Data is $data->{Data}" if defined $data->{Data}; 
    $self->SUPER::characters($data); 
} 

sub end_element { 
    my ($self, $element) = @_; 
    if ($element->{LocalName} eq "expand") { 
     $self->{in_include}--; 
    } else { 
     $self->SUPER::end_element($element); 
    } 
} 

sub start_document { # same for end_document 
    my($self, $data) = @_; 
    return if($self->{in_include}); 
    $self->SUPER::start_document($data); 
} 

sub end_document { # same for end_document 
    my($self, $data) = @_; 
    return if($self->{in_include}); 
    $self->SUPER::end_document($data); 
} 

1; 

#============================ 
#Main programm 
#============================ 
use strict; 
use warnings; 

use XML::SAX::ParserFactory; 
use XML::SAX::Writer; 

my $out; 

my $o = XML::SAX::Writer->new(Output => \$out); 
my $h = MyHandler->new(Handler => $o); 
my $p = XML::SAX::ParserFactory->parser(Handler => $h); 

my $data; 
{ local undef $/ }; $data = <DATA>; 
$p->parse_string($data); 
print $out; 


__DATA__ 
<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd"> 
<body> 
<wicket:panel> 
    <form wicket:id="mvpForm"> 
     <span>Edit Information: </span> 
     <input type="checkbox" wicket:id="editForm"/> 

     <span>Name: </span> 
     <span wicket:id="name"></span> 
     <input type="text" wicket:id="nameEdit"/> 

     <span>Last Name: </span> 
     <span wicket:id="lastName"></span> 
     <input type="text" wicket:id="lastNameEdit"/> 

     <span>DOB: </span> 
     <span wicket:id="dob"></span> 
     <input type="text" wicket:id="dobEdit"/> 

     <span>Occupation: </span> 
     <span wicket:id="occupation"></span> 
     <input type="text" wicket:id="occupationEdit"/> 

     <span>Gender: </span> 
     <span wicket:id="gender"></span> 
     <span wicket:id="genderEdit"/> 

     <input type="submit" wicket:id="submit"/> 

     <expand/> 

    </form> 
</wicket:panel> 
</body> 
</html> 

<expand/>标签将<expanded>This is my expanded tag</expanded>被替换。

基本上所有需要的是创建一个新的解析器,并将其传递给一个文件/字符串进行解析。但是,请注意,有几个陷阱。第一个是停止传播你已经拦截了要扩展标签的事件。换句话说,不管何时扩展/嵌套标签,都不要调用$ self-> SUPER :: start/end_element,这会阻止替换的标签在输出中结束。其次,它需要拦截START_DOCUMENT/END_DOCUMENT并跳过呼吁那些那些家长,否则下面的错误会产生:

试图弹出上下文不推的上下文是/ usr /共享/的perl5/XML/NamespaceSupport 。第79行,大块1。

换句话说一些清理失败:正在触发

此消息,因为XML :: NamespaceSupport确实在START_DOCUMENT事件的一些初始化和上END_DOCUMENT事件进行一些清理。问题在于,在您的代码中,主文档将包含一对这样的事件,并且每个包含的文档都会有一对嵌套对。当发生第二个end_document事件时,没有任何东西需要清理 - 因此是消息。 Taken from here

1

好像从主要名称作家选秀元素的名称,而不是的localName。因此,而不是修改LocalName修改名称以获得所需的结果。

if($data->{LocalName} eq 'span') { 
    $data->{LocalName} = 'naps'; 
} 

将其更改为

if($data->{LocalName} eq 'span') { 
    $data->{Name} = 'naps'; 
} 
+0

有关添加文本节点是什么? – daxim 2012-04-20 07:16:08

+0

我不认为SAX支持添加节点。可能使用肮脏的方式! – tuxuday 2012-04-20 07:34:33

+0

谢谢,那是一个有点意外的寿:)。是的,看起来最好的方法是在找到可扩展节点时创建另一个sax解析器,但是如何将它与主处理管道合并?我会再试验一下,可能毕竟有一个解决方案。 – dryajov 2012-04-20 16:47:10

2

SAX不是这样微不足道的变化的最佳工具。考虑一个DOM实现。

use strictures; 
use XML::LibXML qw(); 
my $dom = XML::LibXML->load_xml(…); 

for my $e ($dom->findnodes('//*')) { 
    $e->setNodeName('naps') if 'span' eq $e->nodeName; 
    if ('expandable' eq $e->nodeName) { 
     $e->setNodeName('expanded'); 
     $e->appendText('This is an expanded element'); 
    } 
} 
print $dom->toString; # ->toFile 
+0

谢谢,这也适用,唯一的consern是这将是如何内存密集。 – dryajov 2012-04-20 19:33:19

2

这里是一个XML::Twig基于解决方案,我更容易找到比SAX使用(但后来我可能会有点偏颇; - )。由于只有1 span(或expandable)元素保留在内存中,因此非常有效。

#!/usr/bin/perl 

use strict; 
use warnings; 

use XML::Twig; 

XML::Twig->new(twig_roots => { span  => sub { $_->set_tag('naps')->flush; }, 
           expandable => sub { XML::Twig::Elt->new(expanded => 'this is an expanded element')->print; }, 
           }, 
       twig_print_outside_roots => 1, 
      ) 
      ->parsefile(\*DATA); 
__DATA__ 
<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd"> 
<body> 
<wicket:panel> 
    <form wicket:id="mvpForm"> 
     <span>Edit Information: </span> 
     <input type="checkbox" wicket:id="editForm"/> 

     <span>Name: </span> 
     <span wicket:id="name"></span> 
     <input type="text" wicket:id="nameEdit"/> 

     <span>Last Name: </span> 
     <span wicket:id="lastName"></span> 
     <input type="text" wicket:id="lastNameEdit"/> 

     <span>DOB: </span> 
     <span wicket:id="dob"></span> 
     <input type="text" wicket:id="dobEdit"/> 


     <span>Occupation: </span> 
     <span wicket:id="occupation"></span> 
     <input type="text" wicket:id="occupationEdit"/> 


     <span>Gender: </span> 
     <span wicket:id="gender"></span> 
     <span wicket:id="genderEdit"/> 

     <input type="submit" wicket:id="submit"/> 

    </form> 

<xmltag> 
    <expandable/> 
</xmltag> 

</wicket:panel> 
</body> 
</html> 
+0

+1容易的事情容易 – daxim 2012-04-20 11:06:00

+0

至少在模块的用户; - ) – mirod 2012-04-20 11:37:30

+0

,看起来非常简单,方便。我试图从基于DOM /树解决方案望而却步,因为他们通常更内存密集型,但树枝是DOM和SAX的亮度的便利之间进行很好的平衡。谢谢! – dryajov 2012-04-20 16:44:22