PHP简单的HTML DOM解析器 - RSS中的链接元素

我刚开始使用PHP简单的HTML DOM解析器（http://simplehtmldom.sourceforge.net/），并且在解析XML时遇到了一些问题。PHP简单的HTML DOM解析器 - RSS中的链接元素

我可以完全解析HTML文档中的所有链接，但解析来自RSS提要（XML格式）的链接不起作用。例如，我想分析从http://www.bing.com/search?q=ipod&count=50&first=0&format=rss所有环节，所以我用这个代码：

$content = file_get_html('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss'); 

foreach($content->find('item') as $entry) 
{ 
$item['title']  = $entry->find('title', 0)->plaintext; 
$item['description'] = $entry->find('description', 0)->plaintext; 
$item['link'] = $entry->find('link', 0)->plaintext; 
$parsed_results_array[] = $item; 
} 

print_r($parsed_results_array);

脚本解析标题和描述，但链接元素为空。有任何想法吗？我的猜测是“链接”是保留字或其他内容，那么如何让解析器工作？

来源

2014-07-22 Mindaugas Li

我在SimpleDomParser和项目。它吮吸。那么，也许它没有出现时，但你会更好[FluentDOM]（https://github.com/FluentDOM/FluentDOM）:)因为你“刚开始”，我不认为这将是一个很难的改变？ – MoshMage

PHP简单的HTML DOM解析器不用于解析XML！用户SimpleXML改为：http://php.net/manual/en/book.simplexml.php – AndiPower

我建议你使用正确的工具完成这项工作。使用SimpleXML：此外，其内置的:)

$xml = simplexml_load_file('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss'); 
$parsed_results_array = array(); 
foreach($xml as $entry) { 
    foreach($entry->item as $item) { 
     // $parsed_results_array[] = json_decode(json_encode($item), true); 
     $items['title'] = (string) $item->title; 
     $items['description'] = (string) $item->description; 
     $items['link'] = (string) $item->link; 
     $parsed_results_array[] = $items; 
    } 
} 

echo '<pre>'; 
print_r($parsed_results_array);

应该产生这样的：

Array 
(
    [0] => Array 
     (
      [title] => Apple - iPod 
      [description] => Learn about iPod, Apple TV, and more. Download iTunes for free and purchase iTunes Gift Cards. Check out the most popular TV shows, movies, and music. 
      [link] => http://www.apple.com/ipod/ 
     ) 

    [1] => Array 
     (
      [title] => iPod - Wikipedia, the free encyclopedia 
      [description] => The iPod is a line of portable media players designed and marketed by Apple Inc. The first line was released on October 23, 2001, about 8½ months after ... 
      [link] => http://en.wikipedia.org/wiki/IPod 
     )

来源

2014-07-22 14:17:01 Ghost

谢谢，它解决了我的问题。 –

@MindaugasLi确信男人没有问题 – Ghost

如果你习惯使用PHP简单HTML DOM，您可以继续使用它！太多的方法会导致混淆，而simplehtmldom已经非常简单和强大。

要确保你这样开始：

require_once('lib/simple_html_dom.php'); 

$content = file_get_contents('http://www.bing.com/search?q=ipod&count=50&first=0&format=rss'); 
$xml = new simple_html_dom(); 
$xml->load($content);

然后你就可以用你的查询去！

来源

2015-02-25 00:33:58 tong

编辑simple_html_doom类

protected $self_closing_tags

删除键 “链接”

BEFORE：

protected $self_closing_tags = array('img'=>1, 'br'=>1,'link'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

AFTER：

protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);

来源

2015-06-24 16:01:29

PHP简单的HTML DOM解析器 - RSS中的链接元素

回答

相关问题