2017-11-25 146 views
0

我使用flex和bison为虚构的编程语言创建解析器。将会有有效和无效的变量名称。Flex/Bison - 我的正则表达式不匹配两个或更多个X的实例,例如XXY-1或XXY-1

XXXX XY-1 // valid 
XXXXX Z // valid 
XXX Y // valid 
XXX 5Aet // invalid 
XXXX XXAB-Y // invalid 

x的开头只是指定变量的大小。变量5Aet无效,因为它以数字开头。我已成功地匹配这个

[\_\-0-9][a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

变量XXAB-Y正则表达式是无效因为变量名称不能以两个或两个以上x字符开始。

我试图匹配这个正则表达式,但我一直不成功。我已经尝试过以下表达式的各种组合,但没有任何工作。变量保持匹配有效。

[X]{2,}[A-Z0-9\-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

[X]{2,0}[\_\-0-9][a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

lexer.l片断

[\t ]+ // ignore whitespaces 

\n // Ignore new line 

[\"][^"]*[\"] yylval.string = strdup(yytext); return TERM_STR; 

";" return TERM_SEPARATOR; 

"." return TERM_FULLSTOP; 

[0-9]+ yylval.integer = atoi(yytext); return TERM_INT; 

XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

[\_\-0-9]+[a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

[A-Z][A-Z0-9\-]* yylval.string = strdup(yytext); return TERM_VARIABLE_NAME; 

[X]+ yylval.integer = yyleng; return TERM_SIZE; 

. return TERM_INVALID_TOKEN; 

parser.y片​​断

program: 
    /* empty */ | 
    begin middle_declarations body grammar_s end { 
     printf("\nParsing complete\n"); 
     exit(0); 
    }; 

begin: 
    TERM_BEGINING TERM_FULLSTOP; 

body: 
    TERM_BODY TERM_FULLSTOP; 

end: 
    TERM_END TERM_FULLSTOP; 

middle_declarations: 
    /* empty */ | 
    //Left recursive to allow for many declearations 
    middle_declarations declaration TERM_FULLSTOP; 

declaration: 
    TERM_SIZE TERM_VARIABLE_NAME { 
     createVar($1, $2); 
    } 
    | 
    TERM_SIZE TERM_INVALID_VARIABLE_NAME { 
     printInvalidVarName($2); 
    }; 

grammar_s: 
    /* empty */ | 
    grammar_s grammar TERM_FULLSTOP; 

grammar: 
    add | move | print | input; 

add: 
    TERM_ADD TERM_INT TERM_TO TERM_VARIABLE_NAME { 
     addIntToVar($2, $4); 
    } 
    | 
    TERM_ADD TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME { 
     addVarToVar($2, $4); 
    } 

    ; 

move: 
    TERM_MOVE TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME { 
     moveVarToVar($2, $4); 
    } 
    | 
    TERM_MOVE TERM_INT TERM_TO TERM_VARIABLE_NAME { 
     moveIntToVar($2, $4); 
    } 

    ; 

print: 
    /* empty */ | 
    TERM_PRINT rest_of_print { 
     printf("\n"); 
    }; 

rest_of_print: 
    /* empty */ | 
    rest_of_print other_print; 

other_print: 

    TERM_VARIABLE_NAME { 
     printVarValue($1); 
    } 
    | 
    TERM_SEPARATOR { 
     // do nothing 
    } 
    | 
    TERM_STR { 
     printf("%s", $1); 
    } 

    ; 

input: 
    // Fullstop declares grammar 
    TERM_INPUT other_input; 

other_input: 

    /* empty */ | 
    // Input var1 
    TERM_VARIABLE_NAME { 
     inputValues($1); 
    } 
    | 
    // Can be input var1; var2;...varN 
    other_input TERM_SEPARATOR TERM_VARIABLE_NAME { 
     inputValues($2); 
    } 
    ; 

调试输出:

Starting parse 
Entering state 0 
Reading a token: Next token is token TERM_BEGINING (1.1:) 
Shifting token TERM_BEGINING (1.1:) 
Entering state 1 
Reading a token: Next token is token TERM_FULLSTOP (1.1:) 
Shifting token TERM_FULLSTOP (1.1:) 
Entering state 4 
Reducing stack by rule 3 (line 123): 
    $1 = token TERM_BEGINING (1.1:) 
    $2 = token TERM_FULLSTOP (1.1:) 
-> $$ = nterm begin (1.1:) 
Stack now 0 
Entering state 3 
Reducing stack by rule 6 (line 131): 
-> $$ = nterm middle_declarations (1.1:) 
Stack now 0 3 
Entering state 6 
Reading a token: Next token is token TERM_SIZE (1.1:) 
Shifting token TERM_SIZE (1.1:) 
Entering state 8 
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1:) 
Shifting token TERM_VARIABLE_NAME (1.1:) 
Entering state 13 
Reducing stack by rule 8 (line 137): 
    $1 = token TERM_SIZE (1.1:) 
    $2 = token TERM_VARIABLE_NAME (1.1:) 
-> $$ = nterm declaration (1.1:) 
Stack now 0 3 6 
Entering state 10 
Reading a token: Next token is token TERM_FULLSTOP (1.1:) 
Shifting token TERM_FULLSTOP (1.1:) 
Entering state 15 
Reducing stack by rule 7 (line 134): 
    $1 = nterm middle_declarations (1.1:) 
    $2 = nterm declaration (1.1:) 
    $3 = token TERM_FULLSTOP (1.1:) 
-> $$ = nterm middle_declarations (1.1:) 
Stack now 0 3 
Entering state 6 
Reading a token: Next token is token TERM_SIZE (1.1:) 
Shifting token TERM_SIZE (1.1:) 
Entering state 8 
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1:) 
Shifting token TERM_VARIABLE_NAME (1.1:) 
Entering state 13 
Reducing stack by rule 8 (line 137): 
    $1 = token TERM_SIZE (1.1:) 
    $2 = token TERM_VARIABLE_NAME (1.1:) 
-> $$ = nterm declaration (1.1:) 
Stack now 0 3 6 
Entering state 10 
Reading a token: Next token is token TERM_FULLSTOP (1.1:) 
Shifting token TERM_FULLSTOP (1.1:) 
Entering state 15 
Reducing stack by rule 7 (line 134): 
    $1 = nterm middle_declarations (1.1:) 
    $2 = nterm declaration (1.1:) 
    $3 = token TERM_FULLSTOP (1.1:) 
-> $$ = nterm middle_declarations (1.1:) 
Stack now 0 3 
Entering state 6 
Reading a token: Next token is token TERM_SIZE (1.1:) 
Shifting token TERM_SIZE (1.1:) 
Entering state 8 
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1:) 
Shifting token TERM_VARIABLE_NAME (1.1:) 
Entering state 13 
Reducing stack by rule 8 (line 137): 
    $1 = token TERM_SIZE (1.1:) 
    $2 = token TERM_VARIABLE_NAME (1.1:) 
-> $$ = nterm declaration (1.1:) 
Stack now 0 3 6 
Entering state 10 
Reading a token: Next token is token TERM_FULLSTOP (1.1:) 
Shifting token TERM_FULLSTOP (1.1:) 
Entering state 15 
Reducing stack by rule 7 (line 134): 
    $1 = nterm middle_declarations (1.1:) 
    $2 = nterm declaration (1.1:) 
    $3 = token TERM_FULLSTOP (1.1:) 
-> $$ = nterm middle_declarations (1.1:) 
Stack now 0 3 
Entering state 6 
Reading a token: Next token is token TERM_BODY (1.1:) 
Shifting token TERM_BODY (1.1:) 
Entering state 7 
Reading a token: Next token is token TERM_FULLSTOP (1.1:) 
Shifting token TERM_FULLSTOP (1.1:) 
Entering state 11 
Reducing stack by rule 4 (line 126): 
    $1 = token TERM_BODY (1.1:) 
    $2 = token TERM_FULLSTOP (1.1:) 
-> $$ = nterm body (1.1:) 
Stack now 0 3 6 
Entering state 9 
Reducing stack by rule 10 (line 145): 
-> $$ = nterm grammar_s (1.1:) 
Stack now 0 3 6 9 
Entering state 14 
Reading a token: Next token is token TERM_PRINT (1.1:) 
Shifting token TERM_PRINT (1.1:) 
Entering state 20 
Reducing stack by rule 22 (line 180): 
-> $$ = nterm rest_of_print (1.1:) 
Stack now 0 3 6 9 14 20 
Entering state 34 
Reading a token: Next token is token TERM_STR (1.1:) 
Shifting token TERM_STR (1.1:) 
Entering state 41 
Reducing stack by rule 26 (line 194): 
    $1 = token TERM_STR (1.1:) 
-> $$ = nterm other_print (1.1:) 
Stack now 0 3 6 9 14 20 34 
Entering state 44 
Reducing stack by rule 23 (line 182): 
    $1 = nterm rest_of_print (1.1:) 
    $2 = nterm other_print (1.1:) 
-> $$ = nterm rest_of_print (1.1:) 
Stack now 0 3 6 9 14 20 
Entering state 34 
Reading a token: Next token is token TERM_FULLSTOP (1.1:) 
Reducing stack by rule 21 (line 176): 
    $1 = token TERM_PRINT (1.1:) 
    $2 = nterm rest_of_print (1.1:) 
"hEllo" 
-> $$ = nterm print (1.1:) 
Stack now 0 3 6 9 14 
Entering state 25 
Reducing stack by rule 14 (line 150): 
    $1 = nterm print (1.1:) 
-> $$ = nterm grammar (1.1:) 
Stack now 0 3 6 9 14 
Entering state 22 
Next token is token TERM_FULLSTOP (1.1:) 
Shifting token TERM_FULLSTOP (1.1:) 
Entering state 35 
Reducing stack by rule 11 (line 147): 
    $1 = nterm grammar_s (1.1:) 
    $2 = nterm grammar (1.1:) 
    $3 = token TERM_FULLSTOP (1.1:) 
-> $$ = nterm grammar_s (1.1:) 
Stack now 0 3 6 9 
Entering state 14 
Reading a token: Next token is token TERM_END (1.1:) 
Shifting token TERM_END (1.1:) 
Entering state 16 
Reading a token: Next token is token TERM_FULLSTOP (1.1:) 
Shifting token TERM_FULLSTOP (1.1:) 
Entering state 27 
Reducing stack by rule 5 (line 129): 
    $1 = token TERM_END (1.1:) 
    $2 = token TERM_FULLSTOP (1.1:) 
-> $$ = nterm end (1.1:) 
Stack now 0 3 6 9 14 
Entering state 21 
Reducing stack by rule 2 (line 113): 
    $1 = nterm begin (1.1:) 
    $2 = nterm middle_declarations (1.1:) 
    $3 = nterm body (1.1:) 
    $4 = nterm grammar_s (1.1:) 
    $5 = nterm end (1.1:) 

样品输入:

BeGiNInG. 

X XXAB-. 
XX XXX7. 
XX XXXY. 

BoDY. 

print "hEllo". 

EnD. 
+0

请按顺序显示所有规则。 – rici

+0

@rici hello again :)。我添加了对这个问题的修改。 – cod3min3

+0

'[X] {2,0}'是无效的,但是[X] {2,}'对我来说按预期工作。难道你没有从该文件的flex中得到错误“错误的迭代值”? – rici

回答

0
[X]{2,}[A-Z0-9\-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

应该工作得很好,它对我来说工作正常。因为任何进一步X字符

XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

将匹配[A-Z0-9-]字符类:然而,它可以被简化成。 (请注意,这是没有必要写\-字符类中; -会做什么,只要它是无论是在字符类的第一个或最后一件事。)

这种模式(像你这样)也只匹配XX,但[X]+模式将在Flex输入文件中早些时候获胜。

{2,0}是不是一个有效间隔表达,因为0小于2.要指定“2或更多个X”,写X{2,}(或[X]{2,},如果你喜欢。"X"{2,}也有效。)这应该从产生错误消息flex,结果是没有生成词汇扫描仪。 (但是,一个旧的可能仍然躺在附近,这可能会造成混乱。)

+0

我试过'XX [A-Z0-9 - ] * yylval。string = strdup(yytext); return TERM_INVALID_VARIABLE_NAME;'但它仍然不起作用,我不知道为什么它没有被拾起。我已经更新了这些问题,现在它包含调试输出和样本输入。如有必要,我也可以提供完整的代码。 – cod3min3

+1

@ cod3min3:我怀疑你实际上没有再生扫描仪。 Flex肯定会产生'X {2,0}'的错误;此外,您最近的编辑(将[[X] +'移到文件末尾)永远不会返回“TERM_SIZE”标记。 – rici

相关问题