Simple-html-d跳过属性

我想解析HTML页面Google play并获取有关应用程序的一些信息。简单的html-dom完美的工作，但如果页面包含没有空格的代码，它完全ingnores属性。举例来说，我的html代码：Simple-html-d跳过属性

<div class="doc-banner-icon"><img itemprop="image"src="https://lh5.ggpht.com/iRd4LyD13y5hdAkpGRSb0PWwFrfU8qfswGNY2wWYw9z9hcyYfhU9uVbmhJ1uqU7vbfw=w124"/></div>

正如你所看到的，有没有image和src之间的任何空间，所以简单的HTML DOM忽略src属性，只返回<img itemprop="image">。如果我增加空间，它完美的作品。为了得到这个属性我使用下面的代码：

foreach($html->find('div.doc-banner-icon') as $e){   
     foreach($e->find('img') as $i){ 
      $bannerIcon = $i->src;    
     } 
}

我的问题是如何改变这个美丽库得到这个div的全内的文字？

来源

2013-06-20 Nolesh

您可以使用[PHP的DOMDocument]（http://php.net/manual/en/class.domdocument.php）而不是简单的HTML Dom解析器。否则，只需在http://codepad.org/HdUQKx3l查看此代码片段，只需通过DOMDocument加载并保存HTML即可在Simple HTML Dom Parser上添加所需的空格。 –

我只是创建功能，增加了neccessary空格内容：

function placeNeccessarySpaces($contents){ 
$quotes = 0; $flag=false; 
$newContents = ''; 
for($i=0; $i<strlen($contents); $i++){ 
    $newContents.=$contents[$i]; 
    if($contents[$i]=='"') $quotes++; 
    if($quotes%2==0){ 
     if($contents[$i+1]!== ' ' && $flag==true) {    
      $newContents.=' '; 
      $flag=false; 
     }   
    } 
    else $flag=true;   
} 
return $newContents; 
}

再经过file_get_contents功能使用。所以：

$contents = file_get_contents($url, $use_include_path, $context, $offset); 
$contents = placeNeccessarySpaces($contents);

希望它对别人有帮助。

来源

2013-06-20 14:20:00 Nolesh

Simple-html-d跳过属性

回答

相关问题