如何分解和分析特定维基百科文本

我有以下工作示例检索返回SimpleXMLElement对象特定的维基百科页面：如何分解和分析特定维基百科文本

ini_set('user_agent', '[email protected]'); 
$doc = New DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 

$xml = simplexml_import_dom($doc); 

print '<pre>'; 
print_r($xml); 
print '</pre>';

将返回：

SimpleXMLElement Object 
(
    [parse] => SimpleXMLElement Object 
     (
      [@attributes] => Array 
       (
        [title] => Main Page 
        [revid] => 472210092 
        [displaytitle] => Main Page 
       ) 

      [text] => <body><table id="mp-topbanner" style="width: 100%;"...

傻问题/头脑空白。我想要做的是捕获$ xml-> parse->文本元素并反过来解析它。所以最终我想要返回的是以下对象;我如何实现这一目标？

SimpleXMLElement Object 
(
    [body] => SimpleXMLElement Object 
     (
      [table] => SimpleXMLElement Object 
       (
        [@attributes] => Array 
         (
          [id] => mp-topbanner 
          [style] => width:100% ...

来源

2012-01-21 Michael Pasqualone

也许您在寻找'$ doc-> loadHTMLFile（'http://en.wikipedia.org/'）;'？ –

抓住一个新鲜的茶，吃了香蕉后，这里的解决方案，我想出来的：

ini_set('user_agent','[email protected]'); 
$doc = new DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 
$nodes = $doc->getElementsByTagName('text'); 

$str = $nodes->item(0)->nodeValue; 

$html = new DOMDocument(); 
$html->loadHTML($str);

然后，这可以让我获得一个元素的值，这是我后。例如：

echo "Some value: "; 
echo $html->getElementById('someid')->nodeValue;

来源

2012-01-21 02:52:10

如何分解和分析特定维基百科文本

回答

相关问题