2013-10-07 64 views
0

假设$html_dom包含一个具有HTML实体的页面,如 。在下面的输出中,我得到这样的输出 PHP的HTML DOM,XPATH - 奇怪的字符?

$html_dom = new DOMDocument(); 
@$html_dom->loadHTML($html_doc); 
$xpath = new DOMXPath($html_dom); 

$query = '//div[@class="foo"]/div/p'; 
$my_foos = $xpath->query($query_abstract); 
foreach ($my_foos as $my_foo) 
{ 
    echo html_entity_decode($my_foos->nodeValue); 
    die; 
} 

我如何妥善处理这让我没有得到怪异字符?我尝试没有成功如下:

$html_doc = mb_convert_encoding($html_doc, 'HTML-ENTITIES', 'UTF-8'); 
$html_dom = new DOMDocument(); 
$html_dom->resolveExternals = TRUE; 
@$html_dom->loadHTML($html_doc); 
$xpath = new DOMXPath($html_dom); 

$query = '//div[@class="foo"]/div/p'; 
$my_foos = $xpath->query($query); 
foreach ($my_foos as $my_foo) 
{ 
    echo html_entity_decode($my_foos->nodeValue); 
    die; 
} 

回答

1

mb_convert_encoding是个好主意,但预期它不工作,因为DOMDocument似乎有点大马车,当涉及到编码。

mb_convert_encoding移动到实际的节点输出做了诀窍。

$html_dom = new DOMDocument(); 
$html_dom->resolveExternals = TRUE; 
@$html_dom->loadHTML($html_doc); 
$xpath = new DOMXPath($html_dom); 

$query = '//div[@class="foo"]/div/p'; 
$my_foos = $xpath->query($query); 
foreach ($my_foos as $my_foo) 
{ 
    echo mb_convert_encoding($my_foo->nodeValue, 'HTML-ENTITIES', 'UTF-8'); 
    die; 
} 
+0

证实它有效。谢谢。 – StackOverflowNewbie