2013-05-14 79 views
1

我试图弄清楚如何才能从此page中获得电影的标题。从特定链接中提取链接文本

我有这个,但我不能得到它的工作。另外我对DomDocument知之甚少。这当前获取页面上的所有链接。但是,我只需要获取列出的电影标题的链接。

$content = file_get_contents("http://www.imdb.com/movies-in-theaters/"); 

$dom = new DomDocument(); 
$dom->loadHTML($content); 
$urls = $dom->getElementsByTagName('a'); 

回答

2
$dom = new DomDocument(); 
@$dom->loadHTMLFile('http://www.imdb.com/movies-in-theaters/'); 
$urls = $dom->getElementsByTagName('a'); 
$titles = array(); 

foreach ($urls as $url) 
{ 
    if ('overview-top' === $url->parentNode->parentNode->getAttribute('class')) 
     $titles[] = $url->nodeValue; 
} 

print_r($titles); 

将输出:

Array 
(
    [0] => Star Trek Into Darkness (2013) 
    [1] => Frances Ha (2012) 
    [2] => Stories We Tell (2012) 
    [3] => Erased (2012) 
    [4] => The English Teacher (2013) 
    [5] => Augustine (2012) 
    [6] => Black Rock (2012) 
    [7] => State 194 (2012) 
    [8] => Iron Man 3 (2013) 
    [9] => The Great Gatsby (2013) 
    [10] => Pain & Gain (2013) 
    [11] => Peeples (2013) 
    [12] => 42 (2013) 
    [13] => Oblivion (2013) 
    [14] => The Croods (2013) 
    [15] => The Big Wedding (2013) 
    [16] => Mud (2012) 
    [17] => Oz the Great and Powerful (2013) 
) 

您可以使用XPath来做到这一点为好,但我不知道它足够好,这样做的。

+1

非常感谢你,这正是我所需要的。 – 2013-05-14 05:49:54

+0

+“星际迷航”是一部很好的电影 – Baba 2013-05-25 14:25:25