2009-07-17 328 views
5

现在,我一直在寻找一个代码来使用PHP从URL中获取URL。我基本上试图从一条消息中获取一个缩短的URL,然后再做一个HEAD请求来查找实际的链接。从字符串获取URL

任何人都有任何代码从字符串返回URL?

在此先感谢。

编辑为鬼狗:

这里是我解析的样本:

$test = "I am testing this application for http://test.com YAY!"; 

,这里是我得到的回应是解决它:

$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i'; 

preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER); 
$A = $result[0]; 

foreach($A as $B) 
{ 
    $URL = GetRealURL($B); 
    echo "$URL<BR>";  
} 


function GetRealURL($url) 
{ 
    $options = array(
     CURLOPT_RETURNTRANSFER => true, 
     CURLOPT_HEADER   => true, 
     CURLOPT_FOLLOWLOCATION => true, 
     CURLOPT_ENCODING  => "", 
     CURLOPT_USERAGENT  => "spider", 
     CURLOPT_AUTOREFERER => true, 
     CURLOPT_CONNECTTIMEOUT => 120, 
     CURLOPT_TIMEOUT  => 120, 
     CURLOPT_MAXREDIRS  => 10, 
    ); 

    $ch  = curl_init($url); 
    curl_setopt_array($ch, $options); 
    $content = curl_exec($ch); 
    $err  = curl_errno($ch); 
    $errmsg = curl_error($ch); 
    $header = curl_getinfo($ch); 
    curl_close($ch); 
    return $header['url']; 
} 

详细信息请参阅答案。

+0

如何展示你的一个例子重新解析 – ghostdog74 2009-07-17 23:58:46

回答

10

此代码可能是有帮助的(见MadTechie的最新帖子):

http://www.phpfreaks.com/forums/index.php/topic,245248.msg1146218.html#msg1146218

<?php 
$string = "some random text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988"; 

$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i'; 

preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER); 
$A = $result[0]; 

foreach($A as $B) 
{ 
    $URL = GetRealURL($B); 
    echo "$URL<BR>"; 
} 


function GetRealURL($url) 
{ 
    $options = array(
     CURLOPT_RETURNTRANSFER => true, 
     CURLOPT_HEADER   => true, 
     CURLOPT_FOLLOWLOCATION => true, 
     CURLOPT_ENCODING  => "", 
     CURLOPT_USERAGENT  => "spider", 
     CURLOPT_AUTOREFERER => true, 
     CURLOPT_CONNECTTIMEOUT => 120, 
     CURLOPT_TIMEOUT  => 120, 
     CURLOPT_MAXREDIRS  => 10, 
    ); 

    $ch  = curl_init($url); 
    curl_setopt_array($ch, $options); 
    $content = curl_exec($ch); 
    $err  = curl_errno($ch); 
    $errmsg = curl_error($ch); 
    $header = curl_getinfo($ch); 
    curl_close($ch); 
    return $header['url']; 
} 

?> 
+0

是的,那正是我所需要的 – 2009-07-18 00:12:56

2

喜欢的东西:

$matches = array(); 
preg_match_all('/http:\/\/[a-zA-Z0-9.-]+\/[a-zA-Z0-9.-]+/', $text, $matches); 
print_r($matches); 

你需要调整正则表达式来得到你想要的东西。

要获得URL时,考虑简单的东西如:

curl -I http://url.com/path | grep Location: | awk '{print $2}'

+0

不需要grep:curl -I http://url.com/path | awk'/ Location/{print $ 2}' – ghostdog74 2009-07-18 00:19:08