php超时与file_get_html

我一直试图通过使用simple_html_dom lib for php从维基网站获取一些数据。基本上我所做的就是使用wikia api转换成html呈现并从那里提取数据。解压后，我会将这些数据抽取到mysql数据库中进行保存。我的问题是，通常我会拉300条记录，我会卡住93个记录file_get_html为空，这将导致我的find（）函数失败。我不知道为什么会停在93分的记录，但我已经尝试了各种解决方案，如php超时与file_get_html

ini_set('default_socket_timeout', 120); 
    set_time_limit(120);

基本上我将不得不访问维基页面300次得到那些300条记录。但大多数情况下，我会设法在file_get_html变为null之前获得93条记录。任何想法如何解决这个问题？

我也测试卷曲以及具有相同的问题。

function test($url){ 
$ch=curl_init(); 
$timeout=5; 

curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 

$result=curl_exec($ch); 
curl_close($ch); 
return $result; 
} 

$baseurl = 'http://xxxx.wikia.com/index.php?'; 

foreach($resultset_wiki as $name){ 
    // Create DOM from URL or file 
$options = array("action"=>"render","title"=>$name['name']); 
$baseurl .= http_build_query($options,'','&'); 
$html = file_get_html($baseurl); 
if($html === FALSE) { 
echo "issue here"; 
} 
    // this code for cURL but commented for testing with file_get_html instead 
    $a = test($baseurl); 
    $html = new simple_html_dom(); 
    $html->load($a); 

    // find div stuff here and mysql data pumping here. 
}

$ resultsetwiki是与标题的列表中的阵列，以从取维基，基本上resultsetwiki数据集是从负载DB以及执行搜索之前。

实际上我将这种类型的错误

Call to a member function find() on a non-object in

来源

2014-12-02 user1897151

您是否尝试全部使用'curl'？ – Ghost 2014-12-02 07:16:10

是的，我做过了，但是我仍然可以得到与第93张唱片上的null问题相同的结果。就像没有使用卷曲一样。 – user1897151 2014-12-02 07:17:40

网站是不是只限制你，因为你在很短的时间内发出大量的电话给他们？ – Erik 2014-12-02 07:25:47

回答我自己的问题，似乎是我使用的URL，我已经改变了与后卷曲后的动作和标题参数，而不是

来源

2014-12-03 06:37:48 user1897151

php超时与file_get_html

回答

相关问题