2013-02-22 50 views
0

希望这是一个非常简单的解决方案,我是PHP的新手,所以我可能会错过显而易见的东西。我正在用ScraperWiki构建一个刮板(虽然这是PHP的一个问题,与SW无关)。代码如下:当变量未设置时,PHP ISSET函数仍在运行

<?php 
require 'scraperwiki/simple_html_dom.php'; 

$allLinks = array(); 

function nextPage($nextUrl, $y) 
{ 
    getLinks($nextUrl, $y);  
} 

function getLinks($url) // gets links from product list page 
{ 
    global $allLinks; 
    $html_content = scraperwiki::scrape($url); 
    $html   = str_get_html($html_content); 

    if (isset($y)) { 
     $x = $y; 
    } else { 
     $x = 0; 
    } 

    foreach ($html->find("div.views-row a.imagecache-product_list") as $el) { 
     $url   = $el->href . "\n"; 
     $allLinks[$x] = 'http://www.foo.com'; 
     $allLinks[$x] .= $url; 
     $x++; 
    } 

    $next = $html->find("li.pager-next a", 0)->href . "\n"; 
    print_r("Printing $next:"); 
    print_r($next); 

    if (isset($next)) { 
     $nextUrl = 'http://www.foo.com'; 
     $nextUrl .= $next; 
     print_r($nextUrl); 
     $y = $x; 
     print_r("Printing X:"); 
     print_r($x); 
     print_r("Printing Y:"); 
     print_r($y); 

     nextPage($nextUrl, $y); 
    } else { 
     return; 
    } 

} 

getLinks("http://www.foo.com/department/accessories"); 

print_r($allLinks); 

?> 

期望的输出:脚本应该刮所有从第一页的链接,找到“下一页”按钮,刮去其URL链接,找到“下页“来自该URL等,等等。当没有更多的“下一页”链接时,它应该停止。

当前输出:代码运行正常,但它不会停止时,它应该。这里的关键是线:

$next = $html->find("li.pager-next a", 0)->href . "\n"; 
if (isset($next)) { } 

我只希望“下一页()”函数来运行,如果存在页面上的li.pager-next a。下面是从控制台输出:

 http://www.foo.com/department/accessories?page=1 
     http://www.foo.com/department/accessories?page=2 
     http://www.foo.com/department/accessories?page=3 
     http://www.foo.com/department/accessories?page=4 
     http://www.foo.com/department/accessories?page=5 
     http://www.foo.com/department/accessories?page=6 
     http://www.foo.com/department/accessories?page=7 
     http://www.foo.com/department/accessories?page=8 
     http://www.foo.com/department/accessories?page=9 
     http://www.foo.com/department/accessories?page=10 

    PHP Notice: Trying to get property of non-object in /home/scriptrunner/script.php on line 31 
// THE LOOP SHOULD BREAK HERE BUT DOESN'T 

     http://www.foo.com 
     http://www.foo.com/home?page=1 
     http://www.foo.com/home?page=2 
     http://www.foo.com/home?page=3 
     http://www.foo.com/home?page=4 
     http://www.foo.com/home?page=5 
     http://www.foo.com/home?page=6 
     http://www.foo.com/home?page=7 
+2

$未来= $ HTML的“发现( ”li.pager-旁边一“,0) - > HREF 。 “\ n” 个;至少将是“\ n”,因此它将被设置。 – mpm 2013-02-22 23:56:40

回答

1

这个怎么样:

$next = $html->find("li.pager-next a", 0); 

if (isset($next)) { 
    $nextUrl = 'http://www.foo.com'; 
    $nextUrl .= $next->href; // move ->href here 
    print_r($nextUrl . "\n"); // put \n here since we don't actually want that char in the url 
    $y = $x; 
    print_r("Printing X:"); 
    print_r($x); 
    print_r("Printing Y:"); 
    print_r($y); 

    nextPage($nextUrl, $y); 
} else { 
    return; 
} 
+0

这太简单了,它让我头痛。正如我所说的,PHP新手假设\ n不会影响输出,如果find()返回null! – Jascination 2013-02-23 00:04:23

0

无论值由

返回它不会导致isset($next)为您附加"\n"时返回false它。

使用这样的事情:

$nextElement = $html->find("li.pager-next a", 0); 

if(isset($nextElement)) 
{ 
    $nextUrl = 'http://www.foo.com' . $nextElement->href . PHP_EOL; 

    print_r($nextUrl); 
    $y = $x; 
    print_r("Printing X:"); 
    print_r($x); 
    print_r("Printing Y:"); 
    print_r($y); 

    nextPage($nextUrl, $y); 
} 
-2

只是删除isset()函数

 
    if($next){ 
    }