2014-02-09 47 views
0

我想从RSS提要中获取有关MP3文件长度的信息。如何使用带有XML的Perl从RSS提要中提取数据:: Feed

这里是Perl脚本,我在劈砍而去:

#!/usr/bin/perl 
use XML::Feed; 
use Data::Dumper; 

my $rssurl  = "http://librivox.org/rss/4273"; 
my $feed = XML::Feed->parse(URI->new($rssurl)) 
    or die XML::Feed->errstr; 
print $feed->title, "\n"; 
print $feed->description, "\n"; 
for my $entry ($feed->entries) { 
#  print "entery is [$entry]\n"; 
#  print Dumper($entry); 
     print $entry->title, "\n"; 
     print $entry->{'http://www.itunes.com/dtds/podcast-1.0.dtd'}{'duration'} . "\n"; 
     print $entry->duration . "\n"; 
} 

当我运行该脚本,我得到这样的输出:

Conquest Over Time by SHAARA, Michael 
<p>Pat Travis, a spacer renowned for his luck, is suddenly quite out of it. His job is to beat his competitors to sign newly-Contacted human races to commercial contracts... 

But what can he do when he finds he's on a planet that consults astrology for literally every major decision - and he has arrived on one of the worst-aspected days in history? 

Michael Shaara, later to write the Pulitzer-winning novel "The Killer Angels", wrote this story for Fantastic Universe in 1956. (Summary by Mark F. Smith)</p> 
1 - Section 1 

Can't locate object method "duration" via package "XML::Feed::Entry::Format::RSS" at ./get_feed.pl line 15. 

如果我添加print Dumper($entry);进行调试,我可以看到该位的数据:

$VAR1 = bless({ 
    _version => "2.0", 
    entry => { 
    "enclosure" => { 
     length => "9.6MB", 
     type => "audio/mpeg", 
     url => "http://www.archive.org/download/conquest_over_time_1005_librivox/conquestovertime_1_shaara_64kb.mp3", 
    }, 
    "http://www.itunes.com/dtds/podcast-1.0.dtd" => { block => "No", duration => "00:20:00", explicit => "No" }, 
    "item" => ("\n " x 12), 
    "link" => "http://www.archive.org/download/conquest_over_time_1005_librivox/conquestovertime_1_shaara_64kb.mp3", 
    "title" => "1 - Section 1", 
    }, 
}, "XML::Feed::Entry::Format::RSS") 

我想要的那段数据是持续时间00:20:00。我如何在脚本中获得它?

谢谢!

回答

1

看起来有一个名为entry,你需要使用一个主键:

$entry->{'entry'}{'http://www.itunes.com/dtds/podcast-1.0.dtd'}{'duration'} 
+0

真棒!它做到了。 –

1

您将是不明智的提取像这样的物体的内部信息。唯一保证的功能是文档中描述的功能,作者随时可以随时更改实现,因为该界面没有变化。

特别地,这是一个不寻常的XML实现命名空间:你想要的元素在XML标记itunes:duration,并命名空间itunes。这是为了将它与可能出现在文档中的任何其他duration元素区分开来。您应该按照上一个问题中所述,使用XPath提取所需的数据。这个简短的程序可以在不使用XML::Feed的情况下执行您所需要的操作。

use strict; 
use warnings; 

use LWP::Simple 'get'; 
use XML::XPath; 

my $rssurl = 'http://librivox.org/rss/4273'; 
my $xml = get $rssurl; 
my $xp  = XML::XPath->new(xml => $xml); 

my ($channel) = $xp->findnodes('/rss/channel'); 
printf "Channel Title:  %s\n\n", $channel->find('title'); 
printf "Channel Description: %s\n\n", $channel->find('description'); 

print "ITEMS\n"; 
for my $item ($xp->findnodes('/rss/channel/item')) { 
    printf " Item Title: %s\n", $item->find('title'); 
    printf " Item Duration: %s\n", $item->find('itunes:duration'); 
    print "\n"; 
} 

输出

Channel Title:  Conquest Over Time by SHAARA, Michael 

Channel Description: <p>Pat Travis, a spacer renowned for his luck, is suddenly quite out of it. His job is to beat his competitors to sign newly-Contacted human races to commercial contracts... 

But what can he do when he finds he's on a planet that consults astrology for literally every major decision - and he has arrived on one of the worst-aspected days in history? 

Michael Shaara, later to write the Pulitzer-winning novel "The Killer Angels", wrote this story for Fantastic Universe in 1956. (Summary by Mark F. Smith)</p> 

ITEMS 
    Item Title: 1 - Section 1 
    Item Duration: 00:20:00 

    Item Title: 2 - Section 2 
    Item Duration: 00:18:35 

    Item Title: 3 - Section 3 
    Item Duration: 00:25:12 

    Item Title: 4 - Section 4 
    Item Duration: 00:16:38