如何使用正则表达式评估约束？（PHP，正则表达式）

所以，让我们说，我想接受字符串如下
SomeColumn IN||<||>||= [123, 'hello', "wassup"]||123||'hello'||"yay!"
例如：MyValue IN ['value', 123]或MyInt > 123 - >我觉得你的想法。现在，有什么困扰我的是如何用正则表达式来表达这一点？我正在使用PHP，这就是我现在正在做的事情：
如何使用正则表达式评估约束？（PHP，正则表达式）

  $temp = explode(';', $constraints); 
     $matches = array(); 
     foreach ($temp as $condition) { 
      preg_match('/(.+)[\t| ]+(IN|<|=|>|!)[\t| ]+([0-9]+|[.+]|.+)/', $condition, $matches[]); 
     } 
     foreach ($matches as $match) { 
      if ($match[2] == 'IN') { 
       preg_match('/(?:([0-9]+|".+"|\'.+\'))/', substr($match[3], 1, -1), $tempm); 
       print_r($tempm); 
      } 
     }

真的很感谢任何帮助，我的regex'ing是可怕的。

来源

2012-11-13 Fabian Schneider

我假设你输入类似于此：

$string = 'SomeColumn IN [123, \'hello\', "wassup"];SomeColumn < 123;SomeColumn = \'hello\';SomeColumn > 123;SomeColumn = "yay!";SomeColumn = [123, \'hello\', "wassup"]';

如果使用preg_match_all没有必要explode或建立自己的比赛。请注意，生成的二维数组将切换其尺寸，但这通常是可取的。下面是代码：

preg_match_all('/(\w+)[\t ]+(IN|<|>|=|!)[\t ]+((\'[^\']*\'|"[^"]*"|\d+)|\[[\t ]*(?4)(?:[\t ]*,[\t ]*(?4))*[\t ]*\])/', $string, $matches); 

$statements = $matches[0]; 
$columns = $matches[1]; 
$operators = $matches[2]; 
$values = $matches[3];

也将有一个$matches[4]但它并没有真正的含义，只用在正则表达式中。首先，你在尝试中做了一些错误的事情：

(.+)会尽可能消耗任何字符。所以如果你有一个看起来像IN 13的字符串值中的东西，那么你的第一个重复可能会消耗所有东西，直到那里，并将它作为列返回。它也允许空格和列名内的=。有两种方法可以解决这个问题。要么通过附加?或者更好地限制允许的字符来使得重复“非理性”，所以你不能越过期望的分隔符。在我的正则表达式中，我只允许字母，数字和下划线（\w）作为列标识符。
[\t| ]这混合了两个概念：交替和字符类。它所做的是“匹配标签，管道或空间”。在字符类中，您只需编写所有字符而不用分隔它们。或者你可以写(\t|)这在这种情况下是等价的。
[.+]我不知道你试图用这个做什么，但它匹配一个字面的.或一个文字+。并再次它可能是限制允许的字符，并检查报价的正确匹配（避免'some string"）

现在对于我自己的正则表达式的解释（你可以把它复制到你的代码，也有用，它会工作得很好，再加上你有解释的评论在您的代码）：

preg_match_all('/ 
    (\w+)   # match an identifier and capture in $1 
    [\t ]+   # one or more tabs or spaces 
    (IN|<|>|=|!) # the operator (capture in $2) 
    [\t ]+   # one or more tabs or spaces 
    (    # start of capturing group $3 (the value) 
     (   # start of subpattern for single-valued literals (capturing group $4) 
      \'  # literal quote 
      [^\']* # arbitrarily many non-quote characters, to avoid going past the end of the string 
      \'  # literal quote 
     |   # OR 
      "[^"]*" # equivalent for double-quotes 
     |   # OR 
      \d+  # a number 
     )   # end of subpattern for single-valued literals 
    |    # OR (arrays follow) 
     \[   # literal [ 
     [\t ]*  # zero or more tabs or spaces 
     (?4)  # reuse subpattern no. 4 (any single-valued literal) 
     (?:   # start non-capturing subpattern for further array elements 
      [\t ]* # zero or more tabs or spaces 
      ,  # a literal comma 
      [\t ]* # zero or more tabs or spaces 
      (?4) # reuse subpattern no. 4 (any single-valued literal) 
     )*   # end of additional array element; repeat zero or more times 
     [\t ]*  # zero or more tabs or spaces 
     \]   # literal ] 
    )    # end of capturing group $3 
    /', 
    $string, 
    $matches);

这使得使用PCRE的递归功能，您可以与(?n)重用子模式（或整个正则表达式）（其中n只是您将用于反向引用的数字）。

我能想到的三个主要的东西，可以用这个表达式进行改进：

它不允许浮点数
它不允许转义引号（如果你的价值是'don\'t do this' ，我只会捕获'don\'）。这可以使用negative lookbehind来解决。
它不允许空数组作为值

我包括没有这些（这可以通过在一个子模式包装的所有参数，并使它可选的?容易解决），因为我不知道是否他们适用于你的问题，我认为这个正则表达式已经足够复杂了，可以在这里展示。

通常，正则表达式不够强大，无论如何都无法正确进行语言分析。编写解析器通常会更好。

既然你说过你的regex'ing是可怕的......而正则表达式由于他们不常见的语法看起来像很多黑魔法，他们并不难理解，如果你花一点时间去获取你的回顾他们的基本概念。我可以推荐this tutorial。它真的需要你一路通过！

来源

2012-11-13 22:11:15

如何使用正则表达式评估约束？ （PHP，正则表达式）

回答

相关问题

如何使用正则表达式评估约束？（PHP，正则表达式）