2016-09-17 136 views
0

我有这些跨度标签:获取HTML标签之间的内容与标识里面

<div> 
<span style="background: url('/wp-content/themes/minimum-child/img/address.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span> 
<span style="background: url('/wp-content/themes/minimum-child/img/email.png') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:[email protected]">CONTENT 2</a></span> 
<span style="background: url('/wp-content/themes/minimum-child/img/tel.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span> 
</div> 

,我需要获得跨度之间的内容,但我需要的内容$phone分开单变量$address$email, ,$web等。很明显,我可以使用背景图像的名称作为图案,因为图像的名称仍然相同(address.png,email.png等)。

到目前为止,我认为有必要使用preg_match_all函数,我已经尝试过了,但是如此我没有成功。

我试图(为获得地址$address变量):

$url="'/wp-content/themes/minimum-child/img/address.png'"; 
$tag='span style="background: url('.$url.')'; 
$matches=array(); 
$pattern = "/<$tag ?.*>(.*)<\/span>/"; 
preg_match($pattern, $htmlcontent, $matches); 
$address=$matches[1]; 

不幸的是,它不工作。你有什么想法如何实现它?

+0

是你想要捕获的'SPAN'内容(即:CONTENT_1,CONTENT_2等)还是'style'属性('address.png'等)? – RamRaider

+0

嗨,我需要捕获CONTENT_1等。模式应该是例如address.png – xjabla

回答

0

人们经常说,使用正则表达式解析html充满了问题 - 所以我会选择更简单的方法来使用DOMDocument来帮助处理html片段 - 然后您可以使用正则表达式来进一步优化一些如果需要,结果可能。

$html=' 
<div> 
    <span style="background: url(\'/wp-content/themes/minimum-child/img/address.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span> 
    <span style="background: url(\'/wp-content/themes/minimum-child/img/email.png\') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:[email protected]">CONTENT 2</a></span> 
    <span style="background: url(\'/wp-content/themes/minimum-child/img/tel.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span> 
</div>'; 


$dom=new DOMDocument; 
$dom->loadHTML($html); 

$col=$dom->getElementsByTagName('span'); 
$keep=array(
    'style'=>array(), 
    'data' =>array(), 
    'email'=>array() 
); 

foreach($col as $node){ 
    $keep['style'][]=str_replace("'", "", $node->getAttribute('style')); 
    $keep['data'][]=$node->nodeValue; 
    if($node->hasChildNodes()){ 
     foreach($node->childNodes as $child){ 
      if($child->nodeType==XML_ELEMENT_NODE && $child->hasAttribute('href')) { 
       list($mailto,$address)=explode(':',$child->getAttribute('href')); 
       $keep['email'][]=$address; 
      } 
     } 
    } 
} 
echo '<pre>',print_r($keep,true),'</pre>'; 


/* output 
    ------ 

    Array 
    (
     [style] => Array 
      (
       [0] => background: url(/wp-content/themes/minimum-child/img/address.png) 0px 2px no-repeat; padding-left: 20px; 
       [1] => background: url(/wp-content/themes/minimum-child/img/email.png) 0px 2px no-repeat; padding-left: 20px; 
       [2] => background: url(/wp-content/themes/minimum-child/img/tel.png) 0px 2px no-repeat; padding-left: 20px; 
      ) 

     [data] => Array 
      (
       [0] => CONTENT 1 
       [1] => CONTENT 2 
       [2] => CONTENT 3 
      ) 

     [email] => Array 
      (
       [0] => [email protected] 
      ) 

    ) 
*/ 
相关问题