2012-02-17 33 views
2

我想逃避匹配报价,除了那些在标签属性,例如:除了标签的属性逃逸匹配报价

输入:

xyz <test foo='123 abc' bar="def 456"> f00 'escape me' b4r "me too" but not this </tEsT> blah 'escape " me' 

预期输出:

xyz <test foo='123 abc' bar="def 456"> f00 \'escape me\' b4r \"me too\" but not this </tEsT> blah \'escape " me\' 

我有以下正则表达式:

$result = preg_replace('/(([\'"])((\\\2|.)*?)\2)/', "\\\\$2$3\\\\$2", $input); 

返回:

xyz <test foo=\'123 abc\' bar=\"def 456\"> f00 \'escape me\' b4r \"me too\" but not this </tEsT> blah \'escape " me\' 

现在我想用正则表达式零宽度负的外观后面跳过有等号前面匹配的引号:

$result = preg_replace('/((?<=[^=])([\'"])((\\\2|.)*?)\2)/', "\\\\$2$3\\\\$2", $input); 

但结果仍不如预期:

xyz <test foo='123 abc\' bar="def 456"> f00 \'escape me\' b4r "me too" but not this </tEsT> blah \'escape " me' 

能否请您给我的意见,我怎么可以跳过整个不必要的块(=“等等等等等等”),而不是仅仅跳过第一个报价?

+0

不要用正则表达式来做到这一点。你会后悔的。 – Jon 2012-02-17 10:40:11

回答

2

而不是回头看背景,期待。通常要容易得多。

$result = preg_replace('/([\'"])(?![^<>]*>)((?:(?!\1).)*)\1/', 
         '\\\\$1$2\\\\$1', 
         $subject); 
(['"])   # capture the open quote 
(?![^<>]*>)  # make sure it's not inside a tag 
(    # capture everything up to the next quote 
    (?:    # ...after testing each character to 
    (?!\1|[<>]). # ...to be sure it's not the opening quote 
)*    # ...or an angle bracket 
) 
\1    # match another quote of the same type as the first one 

我假设不会有在属性值的任何尖括号。

+0

它适合我!谢谢你的详细解释:)你从哪里学到了正则表达式? – Artur 2012-02-17 12:49:45

+0

@Artur:主要从阅读[掌握正则表达式](http://shop.oreilly.com/product/9780596528126.do),练习,并在这样的论坛挂出。 :D – 2012-02-18 23:30:46

1

这是另一个。

$str = "xyz <test foo='123 abc' bar=\"def 456\"> f00 'escape me' b4r \"me too\" but not this <br/> <br/></tEsT> blah 'escape \" me'"; 

$str_escaped = preg_replace_callback('/(?<!\<)[^<>]+(?![^<]*\>)/','escape_quotes',$str); 
// check all the strings outside every possible tag 
// and replace each by the return value of the function below 

function escape_quotes($str) { 
    if (is_array($str)) $str = $str[0]; 
    return preg_replace('/(?<!\\\)(\'|")/','\\\$1',$str); 
    // escape all the non-escaped single and double quotes 
    // and return the escaped block 
} 
+0

有人可以验证这是否适用于各种情况?我假设所有** <** and **> **符号都被转义(分别为“<”和“>”),而不是周围的标记。 – inhan 2012-02-17 14:20:36