如何:
$arr = array('<a href="aaa/">Link 1</a>',
'<a href="http://mydomain.com/bbb/">Link 1</a>',
"<a href='aaa/'>Link 1</a>",
'<a href="#top">Link 1</a>');
foreach($arr as $lnk) {
$lnk = preg_replace('~(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2~i','$1="http://mydomain.com/$3"', $lnk);
echo $lnk,"\n";
}
输出:
<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="http://mydomain.com/bbb/">Link 1</a>
<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="#top">Link 1</a>
说明:
The regular expression:
(?-imsx:(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with^and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
href 'href'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
src 'src'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
["\'] any character of: '"', '\''
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
http:// 'http://'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[^\2]* any character except: '\2' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\2 what was matched by capture \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
我无法弄清楚如何使这项工作在一个贪婪的时尚多HTML的一线blob。我已经尝试了'm'修饰符,但没有运气。你能帮我吗? –
谢谢,正则表达式应该稍微改变一下,以便不太理解,也可以覆盖https。 '〜(href | src)=([“\'])(?!#)(?!https?://)/?([^ \ 2] *?)\ 2〜i'' – Jako
@Jako:对于'https?'你是对的,但是'[^ \ 2] *'不需要是不真实的,因为它本身是不确定的。 – Toto