我尝试使用下面的代码来提取HTML页面下面的文本,但我的代码失败..HTML :: TokeParser - 寻找和标记之间的
预算案发表后的文本 - $ 25,000,00
毛(全球) - $ 58,500,00
#!/usr/bin/perl
use HTML::TokeParser;
my $content = <<HTML;
<h5>Budget</h5>
$25,000,000 (estimated)<br/>
<br/>
<h5>Opening Weekend</h5>
$727,327 (USA) (<a href="/date/09-25/">25 September</a> <a href="/year/1994/">1994</a>) (33 Screens)<br/>
<br/>
<h5>Gross</h5>
$28,341,469 (USA) (<a href="/date/08-05/">5 August</a> <a href="/year/2012/">2012</a>)<br/>£2,344,349 (UK) (<a href="/date/05-18/">18 May</a> <a href="/year/1995/">1995</a>)<br/>£1,732,123 (UK) (<a href="/date/04-16/">16 April</a> <a href="/year/1995/">1995</a>)<br/>$58,500,000 (Worldwide)<br/>$555,480 (Belgium)<br/>ESP 637,291,985 (Spain)<br/>
<br/>
<h5>Admissions</h5>
82,890 (Belgium)<br/>163,594 (France) (<a href="/date/03-28/">28 March</a> <a href="/year/1995/">1995</a>)<br/>410,811 (Germany) (<a href="/date/12-31/">31 December</a> <a href="/year/1995/">1995</a>)<br/>1,245,604 (Spain)<br/>
<br/>
<h5>Filming Dates</h5>
<a href="/date/06-16/">16 June</a> <a href="/year/1993/">1993</a> - <a href="/date/09-10/">10 September</a> <a href="/year/1993/">1993</a><br/>
<br/>
HTML
my $description = "";
my $tp = HTML::TokeParser->new(\$content) || die "Can't open: $!";
while (my $token = $tp->get_tag("h5")) {
my $text = $parser->get_text();
last if $text =~ /budget/i;
}
您的代码存在各种问题。我已经将引号从双引号更改为heredoc,因为该字符串包含双引号。它还包含以'$'开头的货币值,当引用时,它们变成变量,所以你需要'$ fo = <<'HTML';'。 – simbabque
@simbabque - 感谢您对代码进行格式化。 – doubledecker