Perl正则表达式查找元素中的元素

我需要通过正则表达式从<div id="class1">到</div>的末尾查找。我可能在它的文本里面有很多<div>。请在下面找到代码Perl正则表达式查找元素中的元素

This is example <div id="class1">This is <div id="subclass1">This is </div> <div id="subclass2">This is </div> This is </div> This is example

我试过下面的代码。但它只能达到的<div id="subclass1"> 任何帮助我解决这个问题？我试图捕捉

代码是：

<div id="class1">(?:(?!<\/div>).)*?</div>

来源

2012-12-08 siva2012

'的perldoc -q html' – toolic

请不要尝试用正则表达式解析HTML。正则表达式不服任务。使用HTML解析器。 http://htmlparsing.com/perl.html有一些Perl的例子。 –

Obligatory链接：http://stackoverflow.com/questions/1732348 - 阅读这个问题的答案 –

使用合适的HTML解析器。

use strict; 
use warnings; 
use feature qw(say); 

use XML::LibXML qw(); 

my $html = 'This is example <div id="class1">This is <div id="subclass1">This is </div> <div id="subclass2">This is </div> This is </div> This is example'; 

my $parser = XML::LibXML->new(); 
my $doc = $parser->parse_html_string($html); 
my $root = $doc->documentElement(); 

for my $div ($root->findnodes('//div[@id="class1"]')) { 
    say "[", $div->toString(), "]"; 
}

来源

2012-12-08 04:54:20 ikegami

感谢您的源代码。这是否可能通过正则表达式 – siva2012

当然，用'''=〜/（？{...}）/; – ikegami

$ echo 'This is example <div id="class1">This is <div id="subclass1">This is </div> <div id="subclass2">This is </div> This is </div> This is example' | sed -n 's/<div id="class1">\(.*\)<\/div>/\1/p' 
This is example This is <div id="subclass1">This is </div> <div id="subclass2">This is </div> This is This is example

来源

2012-12-08 02:54:17 palako

您应该使用相应的HTML/XML解析器。如果你想用任何理由用正则表达式，嵌套正则表达式可以帮助你。（详情请查询perldoc perlre。）

$re = qr{ 
    (
    <div[^>]*> 
    (?:(??{$re}) | [^<>]*)* 
    </div> 
) 
}x; 

print "$1\n" if(/$re/o);

来源

2012-12-08 03:34:06 yasu

很多人总是说“使用合适的HTML解析器”来解析HTML而不是正则表达式。有些人没有意识到的是需要满足的要求，这些要求可能需要正则表达式。

<div id=".+?">.*</div>应该为你工作。

http://regexr.com?33336

来源

2012-12-09 12:21:57 Jack

Perl正则表达式查找元素中的元素

回答

相关问题