2010-09-26 38 views
12

我想要做的是找出重定向后的最后/最终的URL。如何在纯PHP中遵循HTTP重定向后获取最终的URL?

我不想用cURL。我想坚持纯PHP(流包装)。

现在我有一个URL(比如说http://domain.test),我使用get_headers()从该页面获取特定的标题。 get_headers还会返回多个Location:标题(请参阅下面的编辑)。有没有办法使用这些标题来构建最终的网址?还是有一个PHP函数会自动执行此操作?

编辑: get_headers()遵循重定向并返回每个响应/重定向的所有标题,所以我拥有所有Location:标题。

+1

有*多个*'位置:'在一个响应头? – Tomalak 2010-09-26 18:08:41

+0

get_headers确实会根据defualt自动执行重定向,所以我得到了多个'Location:'头文件。我想要的是完整的最终URL(http://domain.test/final/page.ext?attr ...) – Weboide 2010-09-26 18:11:02

+0

我不明白这个问题:( – Stewie 2010-09-26 18:15:37

回答

25
/** 
* get_redirect_url() 
* Gets the address that the provided URL redirects to, 
* or FALSE if there's no redirect. 
* 
* @param string $url 
* @return string 
*/ 
function get_redirect_url($url){ 
    $redirect_url = null; 

    $url_parts = @parse_url($url); 
    if (!$url_parts) return false; 
    if (!isset($url_parts['host'])) return false; //can't process relative URLs 
    if (!isset($url_parts['path'])) $url_parts['path'] = '/'; 

    $sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30); 
    if (!$sock) return false; 

    $request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n"; 
    $request .= 'Host: ' . $url_parts['host'] . "\r\n"; 
    $request .= "Connection: Close\r\n\r\n"; 
    fwrite($sock, $request); 
    $response = ''; 
    while(!feof($sock)) $response .= fread($sock, 8192); 
    fclose($sock); 

    if (preg_match('/^Location: (.+?)$/m', $response, $matches)){ 
     if (substr($matches[1], 0, 1) == "/") 
      return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]); 
     else 
      return trim($matches[1]); 

    } else { 
     return false; 
    } 

} 

/** 
* get_all_redirects() 
* Follows and collects all redirects, in order, for the given URL. 
* 
* @param string $url 
* @return array 
*/ 
function get_all_redirects($url){ 
    $redirects = array(); 
    while ($newurl = get_redirect_url($url)){ 
     if (in_array($newurl, $redirects)){ 
      break; 
     } 
     $redirects[] = $newurl; 
     $url = $newurl; 
    } 
    return $redirects; 
} 

/** 
* get_final_url() 
* Gets the address that the URL ultimately leads to. 
* Returns $url itself if it isn't a redirect. 
* 
* @param string $url 
* @return string 
*/ 
function get_final_url($url){ 
    $redirects = get_all_redirects($url); 
    if (count($redirects)>0){ 
     return array_pop($redirects); 
    } else { 
     return $url; 
    } 
} 

而且,一如既往地给予信贷:

http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/

+0

你先生,只是救了我几个小时搜索。一切都按预期工作。 – Dave 2011-10-18 05:00:06

+0

我不得不说,对于我的测试这个CURL解决方案更可靠: http://stackoverflow.com/questions/17472329/php-get-url-of-redirect-from-source-url – 2017-05-05 14:04:36

36
function getRedirectUrl ($url) { 
    stream_context_set_default(array(
     'http' => array(
      'method' => 'HEAD' 
     ) 
    )); 
    $headers = get_headers($url, 1); 
    if ($headers !== false && isset($headers['Location'])) { 
     return $headers['Location']; 
    } 
    return false; 
} 

此外...

正如评论所提到的,在最后项目$headers['Location']将b e在所有重定向后最终到达的网址。但需要注意的是,它不会总是是一个数组。有时候,这只是一个普通的非数组变量。在这种情况下,试图访问最后一个数组元素很可能会返回单个字符。不理想。

如果你只在最后的网址感兴趣,所有的重定向后,我会建议改变

return $headers['Location']; 

return is_array($headers['Location']) ? array_pop($headers['Location']) : $headers['Location']; 

...这只是if short-hand

if(is_array($headers['Location'])){ 
    return array_pop($headers['Location']); 
}else{ 
    return $headers['Location']; 
} 

此修复将处理这两种情况(数组,非数组),并删除我们的需要在调用该函数后删除最终的URL。

如果没有重定向,该函数将返回false。同样,该功能也会返回false以查找无效的网址(由于任何原因无效)。因此,在之前运行此功能对于check the URL for validity很重要,否则将重定向检查合并到您的验证中。

+0

这是否遵循所有重定向并返回最终的URL? – Weboide 2011-09-29 15:43:28

+1

太棒了!这值得赞赏。 – Ashfame 2013-05-16 01:50:02

+1

非常棒! +1 – user327843 2014-04-21 16:08:04

3

xaav答案很好;除了以下两个问题:

  • 它不支持HTTPS协议=>溶液提议作为原始网站评论:http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
  • 有些网站是行不通的,因为他们是不会认的根本用户代理(客户端浏览器) =>这是简单地通过将用户代理报头字段的固定:我添加一个Android用户代理(你可以找到这里http://www.useragentstring.com/pages/useragentstring.php其它用户代理实例根据您的需要):

    $请求。=“User-Agent:Mozilla/5.0(Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML7 4K)AppleWebkit/534.30(KHTML,如Gecko)Version/4.0 Mobile Safari/534.30 \ r \ n“;

下面是修改后的答案:

/** 
* get_redirect_url() 
* Gets the address that the provided URL redirects to, 
* or FALSE if there's no redirect. 
* 
* @param string $url 
* @return string 
*/ 
function get_redirect_url($url){ 
    $redirect_url = null; 

    $url_parts = @parse_url($url); 
    if (!$url_parts) return false; 
    if (!isset($url_parts['host'])) return false; //can't process relative URLs 
    if (!isset($url_parts['path'])) $url_parts['path'] = '/'; 

    $sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30); 
    if (!$sock) return false; 

    $request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n"; 
    $request .= 'Host: ' . $url_parts['host'] . "\r\n"; 
    $request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n"; 
    $request .= "Connection: Close\r\n\r\n"; 
    fwrite($sock, $request); 
    $response = ''; 
    while(!feof($sock)) $response .= fread($sock, 8192); 
    fclose($sock); 

    if (preg_match('/^Location: (.+?)$/m', $response, $matches)){ 
     if (substr($matches[1], 0, 1) == "/") 
      return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]); 
     else 
      return trim($matches[1]); 

    } else { 
     return false; 
    } 

} 

/** 
* get_all_redirects() 
* Follows and collects all redirects, in order, for the given URL. 
* 
* @param string $url 
* @return array 
*/ 
function get_all_redirects($url){ 
    $redirects = array(); 
    while ($newurl = get_redirect_url($url)){ 
     if (in_array($newurl, $redirects)){ 
      break; 
     } 
     $redirects[] = $newurl; 
     $url = $newurl; 
    } 
    return $redirects; 
} 

/** 
* get_final_url() 
* Gets the address that the URL ultimately leads to. 
* Returns $url itself if it isn't a redirect. 
* 
* @param string $url 
* @return string 
*/ 
function get_final_url($url){ 
    $redirects = get_all_redirects($url); 
    if (count($redirects)>0){ 
     return array_pop($redirects); 
    } else { 
     return $url; 
} 
+0

错误500执行此脚本。 – 2017-07-15 05:31:07

+0

你能提供错误信息吗? – 2017-08-07 08:01:29

2

虽然OP想避免cURL,这是最好的时候,它可以使用它。下面是它具有以下优点

  • 使用卷曲为所有繁重的解决方案,因此可与HTTPS
  • COPES与返回下套管location头名(包括xaav和webjay的答案不处理这个)服务器
  • 允许你控制你想有多深,你放弃

这里之前去的功能:

function findUltimateDestination($url, $maxRequests = 10) 
{ 
    $ch = curl_init(); 

    curl_setopt($ch, CURLOPT_HEADER, true); 
    curl_setopt($ch, CURLOPT_NOBODY, true); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
    curl_setopt($ch, CURLOPT_MAXREDIRS, $maxRequests); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 15); 

    //customize user agent if you desire... 
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)'); 

    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_exec($ch); 

    $url=curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); 

    curl_close ($ch); 
    return $url; 
} 

下面是一个更详细的版本,它允许您检查重定向链而不是让curl跟着它。

function findUltimateDestination($url, $maxRequests = 10) 
{ 
    $ch = curl_init(); 

    curl_setopt($ch, CURLOPT_HEADER, true); 
    curl_setopt($ch, CURLOPT_NOBODY, true); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 15); 

    //customize user agent if you desire... 
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)'); 

    while ($maxRequests--) { 

     //fetch 
     curl_setopt($ch, CURLOPT_URL, $url); 
     $response = curl_exec($ch); 

     //try to determine redirection url 
     $location = ''; 
     if (in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), [301, 302, 303, 307, 308])) { 
      if (preg_match('/Location:(.*)/i', $response, $match)) { 
       $location = trim($match[1]); 
      } 
     } 

     if (empty($location)) { 
      //we've reached the end of the chain... 
      return $url; 
     } 

     //build next url 
     if ($location[0] == '/') { 
      $u = parse_url($url); 
      $url = $u['scheme'] . '://' . $u['host']; 
      if (isset($u['port'])) { 
       $url .= ':' . $u['port']; 
      } 
      $url .= $location; 
     } else { 
      $url = $location; 
     } 
    } 

    return null; 
} 

由于此函数处理重定向链的例子,但其他人不这样做,试试这个:

echo findUltimateDestination('http://dx.doi.org/10.1016/j.infsof.2016.05.005') 

在写这篇文章的时候,这涉及到4个请求,与Location混合和location标题涉及。