2009-10-20 110 views
0

嗨我有一个网站的主页,我正在阅读使用卷曲,我需要获取该网站的页数。从网页提取价值

的信息是在一个div: -

<div class="pager"> 
<span class="page-numbers current">1</span> 
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a> 
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a> 
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a> 
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a> 
<span class="page-numbers dots">&hellip;</span> 

<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a> 
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a> 
</div> 

我所需要的值是15,但是这可能是任何数量取决于网站上,但总是会在相同的位置。

我怎样才能轻松读取这个值,并将其赋值给PHP中的变量。

感谢

乔纳森

回答

2

您可以使用PHP's DOM module了点。用DOMDocument :: loadhtmlfile()读取页面,然后创建一个DOMXPath对象并查询具有class =“page-numbers”属性的文档中的所有span元素。

(编辑:哎呀,这不是你要找的内容,请参阅第二代码片段)

$html = '<html><head><title>:::</title></head><body> 
<div class="pager"> 
<span class="page-numbers current">1</span> 
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a> 
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a> 
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a> 
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a> 
<span class="page-numbers dots">&hellip;</span> 

<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a> 
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a> 
</div> 
</body></html>'; 

$doc = new DOMDocument; 
// since the content "is already here" we use loadhtml(content) 
// instead of loadhtmlfile(url) 
$doc->loadhtml($html); 
$xpath = new DOMXPath($doc); 
$nodelist = $xpath->query('//span[@class="page-numbers"]'); 
echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"'; 

编辑:这是否

<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a> 

(倒数第二a元素)总是点到最后一页,即这个链接是否包含你正在寻找的值?
然后,您可以使用XPath表达式来选择第二个元素,但最后一个元素为a,并从那里选择子元素span

//div[@class="pager"] <- select each <div> where the attribute class equals "pager" 
//div[@class="pager"]/a <- select each <a> that is a direct child of the pager div 
//div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last 
//div[@class="pager"]/a[position()=last()-1]/span <- select the direct child <span> of that second but last <a> element in the pager <div> 

(你可能希望取得一个良好的XPath教程;-))

$doc->loadhtml($html); 
$xpath = new DOMXPath($doc); 
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span'); 
if (0 < $nodelist->length) { 
    echo $nodelist->item(0)->nodeValue; 
} 
else { 
    echo 'not found'; 
} 
+0

真棒 - 感谢我期待着它 – 2009-10-20 14:47:04

+0

您好我试过,但它返回零个 功能getusers($ userurl) { $ doc = new DOMDocument; $ doc-> loadhtml($ userurl); $ xpath = new DOMXPath($ doc); $ nodelist = $ xpath-> query('// span [@ class =“page-numbers”]'); print_r($ nodelist); echo'there are',$ nodelist-> length,'span class having class =“page-numbers”'; } 该URL是http://ask.recipelabs.com/users – 2009-10-20 19:27:33

+1

如果你传递的url需要loadhtmlFILE(),而不是loadhtml()。 – VolkerK 2009-10-20 19:35:45

0

没有直接的功能或简单的方法来做到这一点。你需要建立或使用existing HTML parser来做到这一点。

0

你可以用正则表达式来解析它。首先找到的<span class="page-numbers">所有occurense,然后选择最后一个:

// div html code should be in $div_html 
preg_match_all('#<span class="page-numbers">(\d+)#', $div_html, $page_numbers); 
print_r(end($page_numbers[1])); // prints 15 
0

这是你可能想使用的XPath的东西 - 这需要加载页面的DOM文档对象:

$domDoc = new DOMDocument(); 
$domDoc->loadHTMLFile("http://path/to/yourfile.html"); 
$xp = new DOMXPath($domDoc); 
$nodes = $xp->query("//xpath/to/relevant/node"); 
$value = $nodes[0]; 

我在一段时间内没有写出很好的xpath,所以你应该做一些阅读来找出那部分,但它不应该太难。

0

也许

$nodes = $dom->getElementsByTagName("span"); 
$maxPageNum = 0; 
foreach($nodes as $node) 
{ 
    if($node.class == "page-numbers" && $node.value > $maxPageNum) 
    { 
     $maxPageNum = $node.value; 
    } 
} 

我不知道PHP,所以也许它不是那么容易访问DOM节点的类/内文,但必须有某种方式来获取信息和伪这里应该工作。

0

只是想说很感谢Volkerk的帮助 - 它工作得很好。我不得不做出一些细微的变化,并结束了与此: -

function getusers($userurl) 
{ 
$sSourceData = file_get_contents($userurl); 
$doc = new DOMDocument(); 
@$doc->loadHTML($sSourceData); 

$xpath = new DOMXPath($doc); 
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span'); 
if (0 < $nodelist->length) { 

    $lastpage = $nodelist->item(0)->nodeValue; 
    $users = $lastpage * 35; 
    $userurl = $userurl.'?page='.$lastpage; 

    $sSourceData = file_get_contents($userurl); 

$doc = new DOMDocument(); 
@$doc->loadHTML($sSourceData); 
$xpath = new DOMXPath($doc); 
$nodelist = $xpath->query('//div[@class="user-details"]'); 
$users = $users + $nodelist->length; 
echo 'there are ', $users , ' users'; 

} 
else { 
$xpath = new DOMXPath($doc); 
$nodelist = $xpath->query('//div[@class="user-details"]'); 
echo 'there are ', $nodelist->length, ' users'; 
} 


}