-1

我有两个正则表达式匹配[value]和另一个匹配html属性，但我需要将它们组合成一个单一的正则表达式。PHP的preg_replace在html中找到匹配，但如果它的html属性不匹配

这是我的工作正则表达式找到[value]

$tagregexp = '[a-zA-Z_\-][0-9a-zA-Z_\-\+]{2,}'; 

    $pattern = 
      '\\['        // Opening bracket 
     . '(\\[?)'       // 1: Optional second opening bracket for escaping shortcodes: [[tag]] 
     . "($tagregexp)"      // 2: Shortcode name 
     . '(?![\\w-])'      // Not followed by word character or hyphen 
     . '('        // 3: Unroll the loop: Inside the opening shortcode tag 
     .  '[^\\]\\/]*'     // Not a closing bracket or forward slash 
     .  '(?:' 
     .   '\\/(?!\\])'    // A forward slash not followed by a closing bracket 
     .   '[^\\]\\/]*'    // Not a closing bracket or forward slash 
     .  ')*?' 
     . ')' 
     . '(?:' 
     .  '(\\/)'      // 4: Self closing tag ... 
     .  '\\]'       // ... and closing bracket 
     . '|' 
     .  '\\]'       // Closing bracket 
     .  '(?:' 
     .   '('      // 5: Unroll the loop: Optionally, anything between the opening and closing shortcode tags 
     .    '[^\\[]*+'    // Not an opening bracket 
     .    '(?:' 
     .     '\\[(?!\\/\\2\\])' // An opening bracket not followed by the closing shortcode tag 
     .     '[^\\[]*+'   // Not an opening bracket 
     .    ')*+' 
     .   ')' 
     .   '\\[\\/\\2\\]'    // Closing shortcode tag 
     .  ')?' 
     . ')' 
     . '(\\]?)';       // 6: Optional second closing bracket for escaping shortcodes: [[tag]]

example here

此正则表达式(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?属性和值相匹配。 example here

我想正则表达式来匹配在下面的例子

<div [value] ></div>
<div>[value]</div>

但不[value]找到匹配在这个例子中

<input attr="attribute[value]"/>

只是需要将它做成一个单一的正则表达式中使用我的preg_replace_callback

preg_replace_callback($pattern, replace_matches, $html);

来源

2016-05-17 TarranJones

你有没有考虑使用一个解析器呢？ – chris85

它是PHP字符串，而不是Java字符串，你不需要全部转义。使用x修饰符（如果可以使用nowdoc字符串），而不是使用连接。如果你想处理html（或xml），忘记regex并使用DOMDocument（最终DOMXPath）。 –

其他的事情，关闭方括号不是一个特殊的字符，你不需要逃避它。字符类中的方括号没有什么特别之处，你可以写'[^ []'而不是'[^ \\ []''。 *（你甚至可以写'[^]]和'[]]'，因为在第一个位置，方括号被看作是一个文字字符。）* –

Foreward

在它看起来像你试图解析HTML代码与常规的表面表达。我觉得有必要指出，由于可能会出现所有可能的模糊边缘情况，因此使用正则表达式来解析HTML是不可取的，但似乎您对HTML有一些控制权，因此您应该能够避免使用许多正则表达式警察哭了。

说明

<\w+\s(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\[(?<DesiredValue>[^\]]*)\]) 
| 
<\w+\s?(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*> 
(?:(?!<\/div>)(?!\[).)*\[(?<DesiredValue>[^\]]*)\]

Regular expression visualization

这个正则表达式将执行以下操作：方括号[some value]

是[value]内
- 捕获子是在一个标签
- 是[value]是不是一个标签
- 提供子串的属性区域内没有嵌套在另一个值的ttributes <input attrib=" [value] ">
捕获的子串将不包括包裹方括号
允许任何标签名，或与所需的标签名称
允许value是任何字符串替换\w
难以避免边缘情况

注：这个表达式最好用下列标志使用：

全球
点匹配新行
忽略表达空白
允许重复的命名捕获组

个

例子

现场演示

https://regex101.com/r/tT0bN5/1

示例文字

<div [value 1] ></div> 
<div>[value 2]</div> 
but not find a match in this example 

<div attr="attribute[value 3]"/> 
<img [value 4]> 
<a href="http://[value 5]">[value 6]</a>

样品匹配

MATCH 1 
DesiredValue [6-13] `value 1` 
MATCH 2 
DesiredValue [29-36] `value 2` 
MATCH 3 
DesiredValue [121-128] `value 4` 
MATCH 4 
DesiredValue [159-166] `value 6`

说明

NODE      EXPLANATION 
---------------------------------------------------------------------- 
    <div      '<div' 
---------------------------------------------------------------------- 
    \s      whitespace (\n, \r, \t, \f, and " ") 
---------------------------------------------------------------------- 
    (?=      look ahead to see if there is: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the least amount 
          possible)): 
---------------------------------------------------------------------- 
     [^>=]     any character except: '>', '=' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     ='      '=\'' 
---------------------------------------------------------------------- 
     [^']*     any character except: ''' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
     '      '\'' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     ="      '="' 
---------------------------------------------------------------------- 
     [^"]*     any character except: '"' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
     "      '"' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     =      '=' 
---------------------------------------------------------------------- 
     [^'"]     any character except: ''', '"' 
---------------------------------------------------------------------- 
     [^\s>]*     any character except: whitespace (\n, 
           \r, \t, \f, and " "), '>' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
    )*?      end of grouping 
---------------------------------------------------------------------- 
    \[      '[' 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
     [^\]]*     any character except: '\]' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
    )      end of \1 
---------------------------------------------------------------------- 
    \]      ']' 
---------------------------------------------------------------------- 
)      end of look-ahead 
---------------------------------------------------------------------- 
|      OR 
---------------------------------------------------------------------- 
    <div      '<div' 
---------------------------------------------------------------------- 
    \s?      whitespace (\n, \r, \t, \f, and " ") 
          (optional (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more times 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
    [^>=]     any character except: '>', '=' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
    ='      '=\'' 
---------------------------------------------------------------------- 
    [^']*     any character except: ''' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    '      '\'' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
    ="      '="' 
---------------------------------------------------------------------- 
    [^"]*     any character except: '"' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
    =      '=' 
---------------------------------------------------------------------- 
    [^'"]     any character except: ''', '"' 
---------------------------------------------------------------------- 
    [^\s>]*     any character except: whitespace (\n, 
          \r, \t, \f, and " "), '>' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)*      end of grouping 
---------------------------------------------------------------------- 
    >      '>' 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more times 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
    (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
     <      '<' 
---------------------------------------------------------------------- 
     \/      '/' 
---------------------------------------------------------------------- 
     div>      'div>' 
---------------------------------------------------------------------- 
    )      end of look-ahead 
---------------------------------------------------------------------- 
    (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
     \[      '[' 
---------------------------------------------------------------------- 
    )      end of look-ahead 
---------------------------------------------------------------------- 
    .      any character 
---------------------------------------------------------------------- 
)*      end of grouping 
---------------------------------------------------------------------- 
    \[      '[' 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    [^\]]*     any character except: '\]' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    \]      ']'

来源

2016-05-18 02:26:43

令人难以置信的答案，我很欣赏投入到答案中的时间和精力。我仍然没有完全解决它，但这应该有很大的帮助。 – TarranJones

让我知道这个答案是缺少的，或者我可以帮忙。 –

PHP的preg_replace在html中找到匹配，但如果它的html属性不匹配

回答

Foreward

说明

例子

说明

相关问题