正则表达式跳过C++

这是我的字符串：正则表达式跳过C++

/* 
    Block1 { 

    anythinghere 
    } 
*/ 

// Block2 { } 
# Block3 { } 

Block4 { 

    anything here 
}

我使用这个正则表达式来获得每个块的名称和内部的内容。

regex e(R"~((\w+)\s+\{([^}]+)\})~", std::regex::optimize);

但是这个正则表达式也得到了所有的描述。 PHP中有一个“跳过”选项，您可以使用它跳过所有描述。

What_I_want_to_avoid(*SKIP)(*FAIL)|What_I_want_to_match

但是，这是C++，我不能使用这种跳过方法。我应该怎么做才能跳过所有的描述，并在C++ regex中获取Block4？

此正则表达式检测Block1，Block2，Block3和Block4但我想跳过Block1，Block2，Block3和刚刚获得Block4（跳过说明）。我该如何编辑我的正则表达式才能得到Block4（描述之外的所有内容）？

来源

2016-02-24 BasicYard

它看起来像您尝试ACC使用正则表达式来删除某些东西，这些东西应该由解析器完成。话虽如此，从你的问题来看，你实际上想要匹配的东西并不完全清楚。 –

“跳过所有描述”是什么意思？你是否想要匹配评论？ –

是试着不匹配的评论 – BasicYard

既然你要求这个漫长的正则表达式，就在这里。

这不会处理嵌套块像block{ block{ } }
它只会匹配block{ block{ }}。

由于您指定您使用C++ 11作为引擎，因此我没有使用
递归。如果要使用
PCRE或Perl，或者甚至是BOOST :: Regex，则可以轻松更改此递归。让我知道你是否想看到这一点。

因为它有缺陷，但适用于您的示例。
另一件事也不会做的是分析预处理器指令“＃...”因为
我忘记了这些规则（认为我最近做了，却找不到一条记录）。

要使用它，坐在while (regex_search())循环寻找匹配
捕获组1，if (m[1].success)等。这将是你的块。
其余的比赛是用于评论，报价或非评论，无关
到块。这些必须匹配才能提升比赛位置。

代码是长并且是多余的，因为在C++ 11 EMCAscript中没有函数调用（递归）。就像我说的，使用boost :: regex或其他东西。

基准

样品：

/* 
    Block1 { 

    anythinghere 
    } 
*/ 

// Block2 { } 

Block4 { 

    // CommentedBlock{ asdfasdf } 
    anyth"}"ing here 
} 

Block5 { 

    /* CommentedBlock{ asdfasdf } 
    anyth}"ing here 
    */ 
}

结果：

Regex1: (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})|[\S\s](?:(?!\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})[^/"'\\])*) 
Options: <none> 
Completed iterations: 50/50  (x 1000) 
Matches found per iteration: 8 
Elapsed Time: 1.95 s, 1947.26 ms, 1947261 µs

正则表达式解释：

# Raw:  (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})|[\S\s](?:(?!\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})[^/"'\\])*) 
    # Stringed: "(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(?:\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|(\\w+\\s*\\{(?:(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(?:\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|(?!\\})[\\S\\s][^}/\"'\\\\]*))*\\})|[\\S\\s](?:(?!\\w+\\s*\\{(?:(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(?:\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|(?!\\})[\\S\\s][^}/\"'\\\\]*))*\\})[^/\"'\\\\])*)"  


    (?:        # Comments 
     /\*        # Start /* .. */ comment 
     [^*]* \*+ 
     (?: [^/*] [^*]* \*+)* 
     /        # End /* .. */ comment 
     | 
     //        # Start // comment 
     (?: [^\\] | \\ \n?)*?   # Possible line-continuation 
     \n        # End // comment 
    ) 
|         # OR, 

    (?:        # Non - comments 
     " 
     [^"\\]*       # Double quoted text 
     (?: \\ [\S\s] [^"\\]*)* 
     " 
     | ' 
     [^'\\]*       # Single quoted text 
     (?: \\ [\S\s] [^'\\]*)* 
     ' 
     | 
     (        # (1 start), BLOCK 
       \w+ \s* \{    
       #################### 
       (?:        # ------------------------ 
        (?:        # Comments inside a block 
         /\*        
         [^*]* \*+ 
         (?: [^/*] [^*]* \*+)* 
         /        
        | 
         //        
         (?: [^\\] | \\ \n?)*? 
         \n        
        ) 
       | 
        (?:        # Non - comments inside a block 
         " 
         [^"\\]*       
         (?: \\ [\S\s] [^"\\]*)* 
         " 
        | ' 
         [^'\\]*       
         (?: \\ [\S\s] [^'\\]*)* 
         ' 
        | 
         (?! \}) 
         [\S\s]       
         [^}/"'\\]*      
        ) 
      )*        # ------------------------ 
       #####################   
       \}        
     )        # (1 end), BLOCK 

     |         # OR, 

     [\S\s]       # Any other char 
     (?:        # ------------------------- 
       (?!        # ASSERT: Here, cannot be a BLOCK{ } 
        \w+ \s* \{      
        (?:        # ============================== 
         (?:        # Comments inside a block 
          /\*        
          [^*]* \*+ 
          (?: [^/*] [^*]* \*+)* 
          /        
          | 
          //        
          (?: [^\\] | \\ \n?)*? 
          \n        
         ) 
        | 
         (?:        # Non - comments inside a block 
          " 
          [^"\\]*       
          (?: \\ [\S\s] [^"\\]*)* 
          " 
          | 
          ' 
          [^'\\]*       
          (?: \\ [\S\s] [^'\\]*)* 
          ' 
          | 
          (?! \}) 
          [\S\s]       
          [^}/"'\\]*      
         ) 
        )*        # ============================== 
        \}        
      )        # ASSERT End 

       [^/"'\\]       # Char which doesn't start a comment, string, escape, 
               # or line continuation (escape + newline) 
     )*        # ------------------------- 
    )        # Done Non - comments

来源

2016-02-25 21:06:31 sln

T1; DR：Regular expressions cannot be used to parse full blown computer languages。你想做的事不能用正则表达式来完成。您需要开发一个迷你C++解析器来过滤注释。 The answer to this related question might point you in the right direction。

正则表达式可用于处理regular expressions，但计算机语言（如C++，PHP，Java，C＃，HTML等）具有更复杂的语法，其中包含名为“中间递归”的属性。中间递归包括诸如任意数量的匹配括号，开始/结束引号以及可以包含符号的注释之类的复杂性

如果您想更详细地了解这一点，请参阅read the answers to this question about the difference between regular expressions and context free grammars。如果您真的好奇，请注册Formal Language Theory课程。

来源

2016-02-24 18:59:31

正则表达式跳过C++

回答

相关问题