正则表达式，查找URL的起始地址=“

我正在尝试构建一个函数来查找字符串中的URL并将其更改为链接，但我不想查找已经位于HTML标记中的URL（。像正则表达式，查找URL的起始地址=“

换句话说<A>和<IMG>为例）的正则表达式应该找到这一点，并与链接替换为：

http://www.stackoverflow.com 
www.stackoverflow.com 
www.stackoverflow.com/logo.gif

但不是这些URL的（因为它们已经格式化）：

<a href="http://www.stackoverflow.com">http://www.stackoverflow.com</a> 
<img src="http://www.stackoverflow.com/logo.gif">

我正在使用已经为此开发的RegEx，但它不检查URL是否已经在HTML元素中。（http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without）

这是原来的正则表达式：

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w][email protected])?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w][email protected])[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)/

此相同正则表达式与解释：

(
    (// brackets covering match for protocol (optional) and domain 
    ([A-Za-z]{3,9}:(?:\/\/)?) // match protocol, allow in format http:// or mailto: 
    (?:[\-;:&=\+\$,\w][email protected])? // allow [email protected] for email addresses 
    [A-Za-z0-9\.\-]+ // anything looking at all like a domain, non-unicode domains 
    | // or instead of above 
    (?:www\.|[\-;:&=\+\$,\w][email protected]) // starting with [email protected] or www. 
    [A-Za-z0-9\.\-]+ // anything looking at all like a domain 
) 
    (// brackets covering match for path, query string and anchor 
    (?:\/[\+~%\/\.\w\-]*) // allow optional /path 
    ?\??(?:[\-\+=&;%@\.\w]*) // allow optional query string starting with ? 
    #?(?:[\.\!\/\\\w]*) // allow optional anchor #anchor 
)? // make URL suffix optional 
)

我所试图做的是改变这种寻找，如果URL从="或>开始，如果是，则不应该通过RegEx。由于<A>和<IMG>元素中的URL在启动之前应具有其中一种组合。

我不是最伟大的正则表达式，但我已尽力，我想这是我最好的尝试，到目前为止，但它不会做的伎俩：

/(((^[^\="|\>])([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w][email protected])?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w][email protected])[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))?)/g;

正是这部分我已经加入：

(^[^\="|\>])

这是我的小提琴：

http://jsfiddle.net/0w1g4mm9/2/

来源

2015-08-17 JoakimB

你ç乌尔德尝试是这样的：

string.replace(
    /(<a[^>]*>[^>]*<\a>)|YOUR_REGEX_HERE/g, 
    function(match, link, YOUR_CAPTURE_GROUP_1, etc) { 
    if (link) { 
     return link 
    } 
    return YOUR_DESIRED_REPLACEMENT 
    } 
)

以上的比赛要么已经有效<a>标签或您正在寻找网址的前瞻性字符串，以先到者为准。捕获组用于检测这两者中的哪一个匹配。如果一个有效的链接匹配，只需返回它未修改的。否则，返回你想要的更换。

来源

2015-08-30 20:09:33 lydell

一个不同的aproach得到了一种丑陋。我迭代遍历所有匹配，重建非匹配的源html，匹配我检查matchIndex - 1的字符并添加链接标记。

这样做的好处是已经疯狂复杂的正则表达式不会变得越来越复杂，您可以使用完整的JavaScript来检查当前字符串是否是html元素的一部分。

如果将迭代代码分解出来，它甚至可能最终看起来不错。

var urlRegEx = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w][email protected])?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w][email protected])[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))?)/g; 

var source = $('#source').html(); 
var dest = ""; 
var lastMatchEnd = 0; 
while ((match = urlRegEx.exec(source)) != null) { 
    dest += source.substring(lastMatchEnd, match.index); 
    var end = match.index + match[0].length; 
    var lastChar = source.charAt(match.index - 1); 
    if(lastChar == '"' || lastChar == '>') { // inside link 
    dest += match[0]; 
    } else { 
    dest += "<a href=''>" + match[0] + "</a>"; 
    } 
    lastMatchEnd = end; 
} 
dest += source.substring(lastMatchEnd); 
$('#target').html(dest);

来源

2015-08-30 20:54:55 Christian

正则表达式，查找URL的起始地址=“

回答

相关问题