2016-12-14 44 views
1

我使用$ ua从我的$ url =“http://finance.yahoo.com/quote/MSFT?p=MSFT”中获取一些HTML;试图解析使用Mojo :: DOM,没有得到标记正确

我可以从URL中获取HTML内容。然后我使用Mojo :: DOM进行子分析,那是正确的一步,对吧?我想另外的条从魔$网址,以获取()html内容的A HREF ...这是我有:

my $ua = Mojo::UserAgent->new(max_redirects => 5, timeout => $timeout); 
my $dom = Mojo::DOM->new; 

my $content = $ua->get($url)->res->dom->at('div#quoteNewsStream-0-Stream')->content; 
my $content2 = $content->$dom->find('a href#'); 

回答

2

只要使用由Mojo::UserAgent返回Mojo::DOM

#!/usr/bin/env perl 

use strict; 
use warnings; 
use v5.10; 

use Mojo::UserAgent; 

my $url = "http://finance.yahoo.com/quote/MSFT?p=MSFT"; 

my $dom = Mojo::UserAgent->new->get($url)->res->dom; 

my $stream = $dom->at('div#quoteNewsStream-0-Stream'); 

for my $href ($stream->find('a')->each) { 
    say $href->{href}; 
} 

输出:

/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html 
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html 
/news/donald-trump-tech-summit-at-trump-tower-202517070.html 
/video/microsoft-surface-sales-surge-disappointment-181934121.html 
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html 
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html 
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html 
/m/7f581deb-0089-341a-b637-e1e979e9e210/ss_5-point-checklist-for.html 

有关使用这些工具的8分钟教程,请Mojocast Episode 5

+0

感谢您关于检查Mojocast的建议。非常感谢ty很多。 –