2013-10-15 40 views
2

我的PHP代码:preg_replace函数从相对变化网址,以绝对

$string = preg_replace('/(href|src)="([^:"]*)(?:")/i','$1="http://mydomain.com/$2"', $string); 

它一起工作:

- <a href="aaa/">Link 1</a> => <a href="http://mydomain.com/aaa/">Link 1</a> 
- <a href="http://mydomain.com/bbb/">Link 1</a> => <a href="http://mydomain.com/bbb/">Link 1</a> 

但不能与:

- <a href='aaa/'>Link 1</a> 
- <a href="#top">Link 1</a> (I don't want to change if url start by #). 

请帮帮我!

回答

0

这将为工作,你

PHP:

function expand_links($link) { 
    return('href="http://example.com/'.trim($link, '\'"/\\').'"'); 
} 

$textarea = preg_replace('/href\s*=\s*(?<href>"[^\\"]*"|\'[^\\\']*\')/e', 'expand_links("$1")', $textarea); 

我也改变了正则表达式用双引号或撇号

0

工作尝试一下本作的模式

/(href|src)=['"]([^"']+)['"]/i 

更换保留原样

编辑:

等待一个...我没有在第一2种链接类型的测试,只是没有工作的人,给我一个时刻

REVISISED:

关于第一正则表达式对不起,我忘了,在它与域工作的第二个例子

(href|src)=['"](?:http://.+/)?([^"']+)['"] 

应该工作

2

如何:

$arr = array('<a href="aaa/">Link 1</a>', 
      '<a href="http://mydomain.com/bbb/">Link 1</a>', 
      "<a href='aaa/'>Link 1</a>", 
      '<a href="#top">Link 1</a>'); 
foreach($arr as $lnk) { 
    $lnk = preg_replace('~(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2~i','$1="http://mydomain.com/$3"', $lnk); 
    echo $lnk,"\n"; 
} 

输出:

<a href="http://mydomain.com/aaa/">Link 1</a> 
<a href="http://mydomain.com/bbb/">Link 1</a> 
<a href="http://mydomain.com/aaa/">Link 1</a> 
<a href="#top">Link 1</a> 

说明:

The regular expression: 

(?-imsx:(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2) 

matches as follows: 

NODE      EXPLANATION 
---------------------------------------------------------------------- 
(?-imsx:     group, but do not capture (case-sensitive) 
         (with^and $ matching normally) (with . not 
         matching \n) (matching whitespace and # 
         normally): 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    href      'href' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
    src      'src' 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    =      '=' 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    ["\']     any character of: '"', '\'' 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
    #      '#' 
---------------------------------------------------------------------- 
)      end of look-ahead 
---------------------------------------------------------------------- 
    (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
    http://     'http://' 
---------------------------------------------------------------------- 
)      end of look-ahead 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
    [^\2]*     any character except: '\2' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \3 
---------------------------------------------------------------------- 
    \2      what was matched by capture \2 
---------------------------------------------------------------------- 
)      end of grouping 
---------------------------------------------------------------------- 
+1

我无法弄清楚如何使这项工作在一个贪婪的时尚多HTML的一线blob。我已经尝试了'm'修饰符,但没有运气。你能帮我吗? –

+0

谢谢,正则表达式应该稍微改变一下,以便不太理解,也可以覆盖https。 '〜(href | src)=([“\'])(?!#)(?!https?://)/?([^ \ 2] *?)\ 2〜i'' – Jako

+0

@Jako:对于'https?'你是对的,但是'[^ \ 2] *'不需要是不真实的,因为它本身是不确定的。 – Toto