0
(顺便说一下,我正在征求有关网站的许可,以此刮取这些东西)。PHP刮刀似乎处于无限循环
很简单的网络刮板,工作正常,当我手动加载所有的链接,但是当我试图通过JSON和变量加载它们(所以我可以做很多与一个脚本的刮,并使通过向JSON添加更多链接,该过程更加模块化),它运行在无限循环上。
(页面已加载大约15分钟现在)
这是我的JSON。只有一家商店出于测试目的,但将会有大约15家店。
[
{
"store":"Incu Men",
"cat":"Accessories",
"general_cat":"Accessories",
"spec_cat":"accessories",
"url":"http://www.incuclothing.com/shop-men/accessories/",
"baseurl":"http://www.incuclothing.com",
"next_select":"a.next",
"prod_name_select":".infobox .fn",
"label_name_select":".infobox .brand",
"desc_select":".infobox .description",
"price_select":"#price",
"mainImg_select":"",
"more_imgs":".product-images",
"product_url":".hproduct .photo-link"
}
]
这里是PHP代码刮板:
<?php
//Set infinite time limit
set_time_limit (0);
// Include simple html dom
include('simple_html_dom.php');
// Defining the basic cURL function
function curl($url) {
$ch = curl_init();
// Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url);
// Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Setting cURL's option to return the webpage data
$data = curl_exec($ch);
// Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch);
// Closing cURL
return $data;
// Returning the data from the function
}
function getLinks($catURL, $prodURL, $baseURL, $next_select) {
$urls = array();
while($catURL) {
echo "Indexing: $url" . PHP_EOL;
$html = str_get_html(curl($catURL));
foreach ($html->find($prodURL) as $el) {
$urls[] = $baseURL . $el->href;
}
$next = $html->find($next_select, 0);
$url = $next ? $baseURL . $next->href : null;
echo "Results: $next" . PHP_EOL;
}
return $urls;
}
$string = file_get_contents("jsonWorkers/incuMens.json");
$json_array = json_decode($string,true);
foreach ($json_array as $value){
$baseURL = $value['baseurl'];
$catURL = $value['url'];
$store = $value['store'];
$general_cat = $value['general_cat'];
$spec_cat = $value['spec_cat'];
$next_select = $value['next_select'];
$prod_name = $value['prod_name_select'];
$label_name = $value['label_name_select'];
$description = $value['desc_select'];
$price = $value['price_select'];
$prodURL = $value['product_url'];
if (!is_null($value['mainImg_select'])){
$mainImg = $value['mainImg_select'];
}
$more_imgs = $value['more_imgs'];
$allLinks = getLinks($catURL, $prodURL, $baseURL, $next_select);
}
?>
任何想法,为什么这个脚本会被无限,而不是运行返回任何东西/停止/打印什么屏?我只是让它运行,直到停止。当我手工操作时,只需要一分钟左右,有时候会少一些,所以我确定这是我的变量/ json的问题,但我不能在我的生活中看到问题所在。
任何人都可以快速浏览并指向正确的方向吗?
+1为标注刷新 – Orangepill
啊!我改变了一个变量的名字($ catURL是$ url),并且意外地没有改变它。干杯兄弟!我会查找'flush()',这是PHP新手,所以可能是我错过了一些简单的东西。 – Jascination