2015-10-04 153 views
4

在一个C项目中,我写了一个函数来返回正则表达式搜索中的第一个捕获组。Posix正则表达式捕获组不正确Posix正则表达式搜索

我期望达到的效果最好由output of this online parser来说明(请注意右侧面板上的捕获组输出)。

我写的功能和测试代码是如下:

#include <stdio.h> 
#include <stdlib.h> 
#include <regex.h> 
#include <string.h> 
#include <assert.h> 

typedef int bool; 
#define true 1 
#define false 0 

/* 
* Obtains the first group that matches the regex pattern in input string 
* The output pointer is made to point to: 
* - in case the regexp compilation succeeded 
*  - the result in case there was a match found 
*  - or NULL in case there was no match 
* - in case the regexp compilation failed 
*  - the error from the compilation process 
* 
* If there was an error while compiling the input reg_exp, then this function 
* returns false, if not, it returns true. 
* 
* NOTE: The user is responsible for free-ing the memory for *output 
*/ 
bool get_first_match(const char* search_str, const char* reg_exp, char** output) 
{ 
    int res, len; 
    regex_t preg; 
    regmatch_t pmatch; 

    // Compile the input regexp 
    if((res = regcomp(&preg, reg_exp, REG_EXTENDED)) != 0) 
    { 
     char* error = (char*)malloc(1024*sizeof(char)); 
     regerror(res, &preg, error, 1024); 
     output = &error; 
     return false; 
    } 

    res = regexec(&preg, search_str, 1, &pmatch, REG_NOTBOL); 
    if(res == REG_NOMATCH) 
    { 
     return true; 
    } 

    len = pmatch.rm_eo - pmatch.rm_so; 
    char* result = (char*)malloc((len + 1) * sizeof(char)); 
    memcpy(result, search_str + pmatch.rm_so, len); 
    result[len] = 0; // null-terminate the result 
    *output = result; 
    regfree(&preg); 
    return true; 
} 

int main() 
{ 
    const char* search_str = "param1=blah&param2=blahblah&param3=blahetc&map=/usr/bin/blah.map"; 
    const char* regexp = "map=([^\\&]*)(&|$)"; 
    char* output; 
    bool status = get_first_match(search_str, regexp, &output); 
    if(status){ 
     if(output) 
      printf("Found match: %s\n", output); 
     else 
      printf("No match found."); 
    } 
    else{ 
     printf("Regex error: %s\n", output); 
    } 
    free(output); 

    return 0; 
} 

然而,output I get from the C code包含在它的字符串的map=一部分,即使我已经在我的第一个捕获组明确排除它。

我能做些什么来获得没有map=部分的捕获组?为什么我得到的在线解析器的结果与我的C程序相比有所不同?

回答

2

这里发生的事情是,你有模式:

const char* regexp = "map=([^\\&]*)(&|$)"; 

哪里,结果(我们称之为数组result),将根据填充:

result = { 
    "map=/usr/bin/blah.map", 
    "/usr/bin/blah.map", 
    "" 
} 

现在,因为您按如下方式拨打regexc

res = regexec(&preg, search_str, 1, &pmatch, REG_NOTBOL); 
// Notice the argument 1 here ---^ 

参数1表示最多一个结果将被存储在pmatch阵列中。因此,您从上面获得result[0]。既然你想第一个匹配组(而不是整个匹配的字符串),你必须:

  1. 定义pmatch是大小至少2的数组。
  2. 通过2作为上述调用regexc的参数。

做上述后:

bool get_first_match(const char* search_str, const char* reg_exp, char** output) 
{ 
    int res, len; 
    regex_t preg; 
    regmatch_t pmatch[3]; 
    // SNIP 
    // SNIP 
    res = regexec(&preg, search_str, 2, &pmatch, REG_NOTBOL); 
    if(res == REG_NOMATCH) 
    { 
     return true; 
    } 
    // Notice changes in the lines below 
    // I am using pmatch[1] since that is equivalent to our 
    // result[1] from above 
    len = pmatch[1].rm_eo - pmatch[1].rm_so; 
    char* result = (char*) malloc((len + 1) * sizeof(char)); 
    memcpy(result, search_str + pmatch[1].rm_so, len); 
    result[len] = 0; // null-terminate the result 
    *output = result; 
    regfree(&preg); 
    return true; 
} 

和程序works as expected

+0

感谢您的精心和精心的解释!真的有帮助。 – balajeerc