2013-01-11 56 views
2

好吧,让我说我有这样的HTML文件。 。 。使用php获取外部html页面onclick内容

<div class="vad buttonDiv" onclick="location.href='http://example.htm?some/link&id=1357900324528'"> 
<div class="vad buttonDiv" onclick="other('example')"> 
<div class="vad buttonDiv" onclick="location.href='http://example.htm?some/link&id=7458758375733'"> 
<div class="vad buttonDiv" onclick="other('example1')"> 
<div class="vad buttonDiv" onclick="location.href='http://example.htm?some/link&id=3474537737392'"> 
<div class="vad buttonDiv" onclick="other('example2')"> 

我想做的事是每个http://example.htm?some/link&id=**************我想从外部HTML页面只显示他们我想下面

$dom = new DOMDocument(); 
@$dom->loadHTML($html); 

$xpath = new DOMXPath($dom); 
$onclicks = $xpath->evaluate("/html/body//div"); 

for ($i = 0; $i < $onclicks->length; $i++) { 
    $onclick = $onclicks->item($i); 
    $display = $onclick->getAttribute("onclick"); 
    echo $display."<br>"; 
} 

的代码,并得到这个

location.href='http://example.htm?some/link&id=1357900324528' 
other('example') 

location.href='http://example.htm?some/link&id=7458758375733 
other('example1') 

location.href='http://example.htm?some/link&id=3474537737392 
other('example2') 

任何想法的如何得到我所追求的,而不是两个点击内容任何答案将不胜感激。

+4

不熟悉XPath,所以不太确定,但是你可以使用'“/ html/body // div [onclick^= location]''作为路径吗? – Passerby

+1

我会给它一个镜头 – Jake

回答

2

而是复杂的DOM解析这将在解析的网站我只是用preg_match_all的HTML错误最终失败的。

这是最有可能更快,方式更简单

if (preg_match_all('/onclick="(location\\.href=([^"]+))"/i', $html, $matches)) 
{ 
    print_r($matches); 
} 

enter image description here

+1

Michel的解决方案非常感谢你的答案,非常感谢 – Jake

1

简单的解决方案:

for ($i = 0; $i < $onclicks->length; $i++) { 
    $onclick = $onclicks->item($i); 
    $display = $onclick->getAttribute("onclick"); 
    if(substr($display, 0, 8) == 'location'){ 
     $display = str_replace(array("location.href='", "'"), '', $display); 
     echo $display."<br>"; 
    } 

} 
2

你是如此接近成功......

在维基百科上学习的XPath几分钟后,我来了用这个xpath工作:

$html=<<<TEXT 
<html> 
<body> 
<div> 
<div class="vad buttonDiv" onclick="location.href='http://example.htm?some/link&id=1357900324528'"></div> 
<div class="vad buttonDiv" onclick="other('example')"></div> 
<div class="vad buttonDiv" onclick="location.href='http://example.htm?some/link&id=7458758375733'"></div> 
<div class="vad buttonDiv" onclick="other('example1')"></div> 
<div class="vad buttonDiv" onclick="location.href='http://example.htm?some/link&id=3474537737392'"></div> 
<div class="vad buttonDiv" onclick="other('example2')"></div> 
</div> 
</body> 
</html> 
TEXT; 
$dom=new DOMDocument(); 
@$dom->loadHTML($html); 
$xpath=new DOMXPath($dom); 
$divs=$xpath->evaluate("/html/body//div[starts-with(@onclick,'location')]"); 
foreach(range(0,$divs->length-1) as $i) 
{ 
    var_dump($divs->item($i)->getAttribute("onclick")); 
} 

上面的代码输出:

string(61) "location.href='http://example.htm?some/link&id=1357900324528'" 
string(61) "location.href='http://example.htm?some/link&id=7458758375733'" 
string(61) "location.href='http://example.htm?some/link&id=3474537737392'" 
2
$url= "http://example.com"; 
$dom = new DOMDocument(); 
@$dom->loadHTML($url); 
$xpath = new DOMXPath($dom); 

$PATH = $xpath->evaluate('/html/body//div[@class="vad buttonDiv"]'); 
for ($i = 0; $i < $PATH->length; $i++) { 
    $lmao = $PATH->item($i); 

$answer = $lmao->getAttribute('onclick'); 
$searchArray = array("location.href='", "'"); 
$replaceArray = array("", ""); 
$link = str_replace($searchArray, $replaceArray, $answer); 
echo $link."<br>" 
} 

显示只是链接的。