2013-05-16 36 views
0

(顺便说一下,我正在征求有关网站的许可,以此刮取这些东西)。PHP刮刀似乎处于无限循环

很简单的网络刮板,工作正常,当我手动加载所有的链接,但是当我试图通过JSON和变量加载它们(所以我可以做很多与一个脚本的刮,并使通过向JSON添加更多链接,该过程更加模块化),它运行在无限循环上。

(页面已加载大约15分钟现在)

这是我的JSON。只有一家商店出于测试目的,但将会有大约15家店。

[ 
    { 
     "store":"Incu Men", 
     "cat":"Accessories", 
     "general_cat":"Accessories", 
     "spec_cat":"accessories", 
     "url":"http://www.incuclothing.com/shop-men/accessories/", 
     "baseurl":"http://www.incuclothing.com", 
     "next_select":"a.next", 
     "prod_name_select":".infobox .fn", 
     "label_name_select":".infobox .brand", 
     "desc_select":".infobox .description", 
     "price_select":"#price", 
     "mainImg_select":"", 
     "more_imgs":".product-images", 
     "product_url":".hproduct .photo-link" 
    } 
] 

这里是PHP代码刮板:

<?php 
//Set infinite time limit 
set_time_limit (0); 
// Include simple html dom 
include('simple_html_dom.php'); 
// Defining the basic cURL function 
function curl($url) { 
    $ch = curl_init(); 
    // Initialising cURL 
    curl_setopt($ch, CURLOPT_URL, $url); 
    // Setting cURL's URL option with the $url variable passed into the function 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
    // Setting cURL's option to return the webpage data 
    $data = curl_exec($ch); 
    // Executing the cURL request and assigning the returned data to the $data variable 
    curl_close($ch); 
    // Closing cURL 
    return $data; 
    // Returning the data from the function 
} 

function getLinks($catURL, $prodURL, $baseURL, $next_select) { 
    $urls = array(); 

    while($catURL) { 
     echo "Indexing: $url" . PHP_EOL; 
     $html = str_get_html(curl($catURL)); 

     foreach ($html->find($prodURL) as $el) { 
      $urls[] = $baseURL . $el->href; 
     } 

     $next = $html->find($next_select, 0); 
     $url = $next ? $baseURL . $next->href : null; 

     echo "Results: $next" . PHP_EOL; 
    } 

    return $urls; 
} 

$string  = file_get_contents("jsonWorkers/incuMens.json"); 
$json_array = json_decode($string,true); 

foreach ($json_array as $value){ 

    $baseURL = $value['baseurl']; 
    $catURL = $value['url']; 
    $store = $value['store']; 
    $general_cat = $value['general_cat']; 
    $spec_cat = $value['spec_cat']; 
    $next_select = $value['next_select']; 
    $prod_name = $value['prod_name_select']; 
    $label_name = $value['label_name_select']; 
    $description = $value['desc_select']; 
    $price = $value['price_select']; 
    $prodURL = $value['product_url']; 

    if (!is_null($value['mainImg_select'])){ 
     $mainImg = $value['mainImg_select']; 
    } 
    $more_imgs = $value['more_imgs']; 



    $allLinks = getLinks($catURL, $prodURL, $baseURL, $next_select); 

} 

?> 

任何想法,为什么这个脚本会被无限,而不是运行返回任何东西/停止/打印什么屏?我只是让它运行,直到停止。当我手工操作时,只需要一分钟左右,有时候会少一些,所以我确定这是我的变量/ json的问题,但我不能在我的生活中看到问题所在。

任何人都可以快速浏览并指向正确的方向吗?

回答

3

您的while($catURL)循环出现问题。您想做什么 ? 此外,您可以强制使用flush()命令在浏览器上显示信息。

+0

+1为标注刷新 – Orangepill

+0

啊!我改变了一个变量的名字($ catURL是$ url),并且意外地没有改变它。干杯兄弟!我会查找'flush()',这是PHP新手,所以可能是我错过了一些简单的东西。 – Jascination