2014-07-22 85 views
2

我刚开始使用PHP简单的HTML DOM解析器(http://simplehtmldom.sourceforge.net/),并且在解析XML时遇到了一些问题。PHP简单的HTML DOM解析器 - RSS中的链接元素

我可以完全解析HTML文档中的所有链接,但解析来自RSS提要(XML格式)的链接不起作用。例如,我想分析从http://www.bing.com/search?q=ipod&count=50&first=0&format=rss所有环节,所以我用这个代码:

$content = file_get_html('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss'); 

foreach($content->find('item') as $entry) 
{ 
$item['title']  = $entry->find('title', 0)->plaintext; 
$item['description'] = $entry->find('description', 0)->plaintext; 
$item['link'] = $entry->find('link', 0)->plaintext; 
$parsed_results_array[] = $item; 
} 

print_r($parsed_results_array); 

脚本解析标题和描述,但链接元素为空。有任何想法吗?我的猜测是“链接”是保留字或其他内容,那么如何让解析器工作?

+0

我在SimpleDomParser和项目。它吮吸。那么,也许它没有出现时,但你会更好[FluentDOM](https://github.com/FluentDOM/FluentDOM):)因为你“刚开始”,我不认为这将是一个很难的改变? – MoshMage

+1

PHP简单的HTML DOM解析器不用于解析XML!用户SimpleXML改为:http://php.net/manual/en/book.simplexml.php – AndiPower

回答

2

我建议你使用正确的工具完成这项工作。使用SimpleXML:此外,其内置的:)

$xml = simplexml_load_file('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss'); 
$parsed_results_array = array(); 
foreach($xml as $entry) { 
    foreach($entry->item as $item) { 
     // $parsed_results_array[] = json_decode(json_encode($item), true); 
     $items['title'] = (string) $item->title; 
     $items['description'] = (string) $item->description; 
     $items['link'] = (string) $item->link; 
     $parsed_results_array[] = $items; 
    } 
} 

echo '<pre>'; 
print_r($parsed_results_array); 

应该产生这样的:

Array 
(
    [0] => Array 
     (
      [title] => Apple - iPod 
      [description] => Learn about iPod, Apple TV, and more. Download iTunes for free and purchase iTunes Gift Cards. Check out the most popular TV shows, movies, and music. 
      [link] => http://www.apple.com/ipod/ 
     ) 

    [1] => Array 
     (
      [title] => iPod - Wikipedia, the free encyclopedia 
      [description] => The iPod is a line of portable media players designed and marketed by Apple Inc. The first line was released on October 23, 2001, about 8½ months after ... 
      [link] => http://en.wikipedia.org/wiki/IPod 
     ) 
+0

谢谢,它解决了我的问题。 –

+0

@MindaugasLi确信男人没有问题 – Ghost

1

如果你习惯使用PHP简单HTML DOM,您可以继续使用它! 太多的方法会导致混淆,而simplehtmldom已经非常简单和强大。

要确保你这样开始:

require_once('lib/simple_html_dom.php'); 

$content = file_get_contents('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss'); 
$xml = new simple_html_dom(); 
$xml->load($content); 

然后你就可以用你的查询去!

0

编辑simple_html_doom类

protected $self_closing_tags 

删除键 “链接”

BEFORE:

protected $self_closing_tags = array('img'=>1, 'br'=>1,'link'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1); 

AFTER:

protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1); 
相关问题