你想使用extract_links()
方法,而不是look_down()
:
use strict;
use warnings;
use LWP::Simple;
use HTML::Tree;
my %seen;
my $url = 'http://www.stephenfry.com/';
my $doc = get($url);
my $adt = HTML::Tree->new();
$adt->parse($doc);
my $links_array_ref = $adt->extract_links('a');
my @links = grep { /www.stephenfry.com/ and !$seen{$_}++ } map $_->[0],
@$links_array_ref;
print "$_\n" for @links;
的部分输出:
http://www.stephenfry.com/
http://www.stephenfry.com/blog/
http://www.stephenfry.com/category/blessays/
http://www.stephenfry.com/category/features/
http://www.stephenfry.com/category/general/
...
WWW::Mechanize使用可简单,而且它确实返回更多链接:
use strict;
use warnings;
use WWW::Mechanize;
my %seen;
my $mech = WWW::Mechanize->new();
$mech->get('http://www.stephenfry.com/');
my @links = grep { /www.stephenfry.com/ and !$seen{$_}++ } map $_->url,
$mech->links();
print $_, "\n" for @links;
的部分输出:
http://www.stephenfry.com/wp-content/themes/fry/images/favicon.png
http://www.stephenfry.com/wp-content/themes/fry/style.css
http://www.stephenfry.com/wordpress/xmlrpc.php
http://www.stephenfry.com/feed/
http://www.stephenfry.com/comments/feed/
...
希望这有助于!
我编辑了标题,以便更详细地描述实际问题。 –
您的预期产出是什么? – toolic
可能是http://www.stephenfry.com/stuff或http://www.stephenfry.com/stuff/morestuff任何http://www.stephenfry.com/链接。 – user3269763