2016-04-16 61 views
0

我有一个页面的XML看起来像:PHP GET IMG SRC从XML

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"> 
    <channel> 
    <title>FB-RSS feed for Salman Khan Fc</title> 
    <link>http://facebook.com/profile.php?id=1636293749919827/</link> 
    <description>FB-RSS feed for Salman Khan Fc</description> 
    <managingEditor>http://fbrss.com (FB-RSS)</managingEditor> 
    <pubDate>31 Mar 16 20:00 +0000</pubDate> 
    <item> 
     <title>Photo - Who is the Best Khan ?</title> 
     <link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link> 
     <description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&amp;oe=57BB41D5&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;Who is the Best Khan ?</description> 
     <author>FB-RSS</author> 
     <guid>1636293749919827_1713146978901170</guid> 
     <pubDate>31 Mar 16 20:00 +0000</pubDate> 
    </item> 
    <item> 
     <title>Photo</title> 
     <link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link> 
     <description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&amp;oe=57778068&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;</description> 
     <author>FB-RSS</author> 
     <guid>1636293749919827_1713146755567859</guid> 
     <pubDate>31 Mar 16 19:58 +0000</pubDate> 
    </item> 
    </channel> 
</rss> 

我想要得到的src S中img S在上述xml

的图像存储在<description>但是,它们不是在

<img...

格式,而他们看起来像:

&lt;img src=&#34;https://scontent.xx.fbc...

<被替换为&lt; ......我想这就是为什么$imgs = $dom->getElementsByTagName('img');什么也没有返回。

有什么解决办法吗?

这是我怎么称呼它:

libxml_use_internal_errors(true); 
$dom = new DOMDocument(); 
$dom->loadXML($xml_file); 
$imgs = ...(get the imgs to extract the src...('img') ??; 

//Then run a possible foreach 
//something like: 

foreach($imgs as $img){ 

    $src= ///the src of the $img 

    //try it out 
    echo '<img src="'.$src.'" /> <br />', 
} 

任何想法?

回答

1

您已将HTML嵌入到XML标记中,因此您必须检索XML节点,加载每个HTML并检索所需的标记属性。

在您的XML中有不同的<description>节点,因此使用->getElementsByTagName将返回超过您所需的节点。使用DOMXPath在正确的树中的位置只检索<description>节点:

$dom = new DOMDocument(); 
libxml_use_internal_errors(True); 
$dom->loadXML($xml); 
$dom->formatOutput = True; 

$xpath = new DOMXPath($dom); 
$nodes = $xpath->query('channel/item/description'); 

然后遍历所有的节点,负荷节点值在新DOMDocument(无需解码HTML实体,DOM已经解码,为你),并提取src<img>节点属性:

foreach($nodes as $node) 
{ 
    $html = new DOMDocument(); 
    $html->loadHTML($node->nodeValue); 
    $src = $html->getElementsByTagName('img')->item(0)->getAttribute('src'); 
} 

eval.in demo

+0

好极了!似乎工作....谢谢你! – ErickBest