2012-01-21 49 views
0

我有以下工作示例检索返回SimpleXMLElement对象特定的维基百科页面:如何分解和分析特定维基百科文本

ini_set('user_agent', '[email protected]'); 
$doc = New DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 

$xml = simplexml_import_dom($doc); 

print '<pre>'; 
print_r($xml); 
print '</pre>'; 

将返回:

SimpleXMLElement Object 
(
    [parse] => SimpleXMLElement Object 
     (
      [@attributes] => Array 
       (
        [title] => Main Page 
        [revid] => 472210092 
        [displaytitle] => Main Page 
       ) 

      [text] => <body><table id="mp-topbanner" style="width: 100%;"... 

傻问题/头脑空白。我想要做的是捕获$ xml-> parse->文本元素并反过来解析它。所以最终我想要返回的是以下对象;我如何实现这一目标?

SimpleXMLElement Object 
(
    [body] => SimpleXMLElement Object 
     (
      [table] => SimpleXMLElement Object 
       (
        [@attributes] => Array 
         (
          [id] => mp-topbanner 
          [style] => width:100% ... 
+0

也许您在寻找'$ doc-> loadHTMLFile('http://en.wikipedia.org/');'? –

回答

1

抓住一个新鲜的茶,吃了香蕉后,这里的解决方案,我想出来的:

ini_set('user_agent','[email protected]'); 
$doc = new DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 
$nodes = $doc->getElementsByTagName('text'); 

$str = $nodes->item(0)->nodeValue; 

$html = new DOMDocument(); 
$html->loadHTML($str); 

然后,这可以让我获得一个元素的值,这是我后。例如:

echo "Some value: "; 
echo $html->getElementById('someid')->nodeValue;