2015-05-24 96 views
0

我一直在尝试将我在网上找到的语法转换为antlr4格式。原文语法如下:https://github.com/cv/asp-parser/blob/master/vbscript.bnf删除antlr4 lexing中的歧义

短版本: 目前我遇到的问题是我认为是由于在lexing阶段模糊不清。

例如,我复制的规则的浮点文字如下:

float_literal : DIGIT* '.' DIGIT+ ('e' PLUS_OR_MINUS? DIGIT+)? 
       | DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+; 

而且该文件以我对信件的定义:

LETTER: 'a'..'z'; 

看来,因为我我在浮点数字中使用'e',那个字符不能被识别为一个字母?在我的研究,我所遇到的具有每个字母令牌的想法,所以信将成为:

letter: A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z; 

而且然而有这更长的字符串我将与E.替代“E”的任何实例文件,如'.and'。所以这种方法需要用DOT A N D来替换那样的东西?这似乎并不正确。

我在做一些根本性的错误,或者有什么我可以做的,以避免这种模糊性?

谢谢, 克雷格

完整的语法如下。

grammar vbscript; 
/*===== Character Sets =====*/ 

SPACES: ' ' -> skip; 

DIGIT: '0'..'9'; 
SEMI_COLON: ':'; 
NEW_LINE_CHARACTER: [\r\n]+; 
WHITESPACE_CHARACTER: [ \t]; 
LETTER: 'a'..'z'; 
QUOTE: '"'; 
HASH: '#'; 
SQUARE_BRACE: '[' | ']'; 
PLUS_OR_MINUS: [+-]; 
ANYTHING_ELSE: ~('"' | '#'); 
ws: WHITESPACE_CHARACTER; 
id_tail: (DIGIT | LETTER | '_'); 
string_character: ANYTHING_ELSE | DIGIT | WHITESPACE_CHARACTER | SEMI_COLON | LETTER | PLUS_OR_MINUS | SQUARE_BRACE; 
id_name_char: ANYTHING_ELSE | DIGIT | WHITESPACE_CHARACTER | SEMI_COLON | LETTER | PLUS_OR_MINUS; 


/*===== terminals =====*/ 
whitespace: ws+ | '_' ws* new_line?; 

comment_line : '' | 'rem'; 

string_literal : '"' (string_character | '""')* '"'; 

float_literal : DIGIT* '.' DIGIT+ ('e' PLUS_OR_MINUS? DIGIT+)? 
       | DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+; 
id    : LETTER id_tail* 
       | '[' id_name_char* ']'; 
iddot   : LETTER id_tail* '.' 
       | '[' id_name_char* ']' '.' 
       | 'and.' 
       | 'byref.' 
       | 'byval.' 
       | 'call.' 
       | 'case.' 
       | 'class.' 
       | 'const.' 
       | 'default.' 
       | 'dim.' 
       | 'do.' 
       | 'each.' 
       | 'else.' 
       | 'elseif.' 
       | 'empty.' 
       | 'end.' 
       | 'eqv.' 
       | 'erase.' 
       | 'error.' 
       | 'exit.' 
       | 'explicit.' 
       | 'false.' 
       | 'for.' 
       | 'function.' 
       | 'get.' 
       | 'goto.' 
       | 'if.' 
       | 'imp.' 
       | 'in.' 
       | 'is.' 
       | 'let.' 
       | 'loop.' 
       | 'mod.' 
       | 'new.' 
       | 'next.' 
       | 'not.' 
       | 'nothing.' 
       | 'null.' 
       | 'on.' 
       | 'option.' 
       | 'or.' 
       | 'preserve.' 
       | 'private.' 
       | 'property.' 
       | 'public.' 
       | 'redim.' 
       | 'rem.' 
       | 'resume.' 
       | 'select.' 
       | 'set.' 
       | 'step.' 
       | 'sub.' 
       | 'then.' 
       | 'to.' 
       | 'true.' 
       | 'until.' 
       | 'wend.' 
       | 'while.' 
       | 'with.' 
       | 'xor.'; 

dot_id   : '.' LETTER id_tail* 
       | '.' '[' id_name_char* ']' 
       | '.and' 
       | '.byref' 
       | '.byval' 
       | '.call' 
       | '.case' 
       | '.class' 
       | '.const' 
       | '.default' 
       | '.dim' 
       | '.do' 
       | '.each' 
       | '.else' 
       | '.elseif' 
       | '.empty' 
       | '.end' 
       | '.eqv' 
       | '.erase' 
       | '.error' 
       | '.exit' 
       | '.explicit' 
       | '.false' 
       | '.for' 
       | '.function' 
       | '.get' 
       | '.goto' 
       | '.if' 
       | '.imp' 
       | '.in' 
       | '.is' 
       | '.let' 
       | '.loop' 
       | '.mod' 
       | '.new' 
       | '.next' 
       | '.not' 
       | '.nothing' 
       | '.null' 
       | '.on' 
       | '.option' 
       | '.or' 
       | '.preserve' 
       | '.private' 
       | '.property' 
       | '.public' 
       | '.redim' 
       | '.rem' 
       | '.resume' 
       | '.select' 
       | '.set' 
       | '.step' 
       | '.sub' 
       | '.then' 
       | '.to' 
       | '.true' 
       | '.until' 
       | '.wend' 
       | '.while' 
       | '.with' 
       | '.xor'; 

dot_iddot  : '.' LETTER id_tail* '.' 
       | '.' '[' id_name_char* ']' '.' 
       | '.and.' 
       | '.byref.' 
       | '.byval.' 
       | '.call.' 
       | '.case.' 
       | '.class.' 
       | '.const.' 
       | '.default.' 
       | '.dim.' 
       | '.do.' 
       | '.each.' 
       | '.else.' 
       | '.elseif.' 
       | '.empty.' 
       | '.end.' 
       | '.eqv.' 
       | '.erase.' 
       | '.error.' 
       | '.exit.' 
       | '.explicit.' 
       | '.false.' 
       | '.for.' 
       | '.function.' 
       | '.get.' 
       | '.goto.' 
       | '.if.' 
       | '.imp.' 
       | '.in.' 
       | '.is.' 
       | '.let.' 
       | '.loop.' 
       | '.mod.' 
       | '.new.' 
       | '.next.' 
       | '.not.' 
       | '.nothing.' 
       | '.null.' 
       | '.on.' 
       | '.option.' 
       | '.or.' 
       | '.preserve.' 
       | '.private.' 
       | '.property.' 
       | '.public.' 
       | '.redim.' 
       | '.rem.' 
       | '.resume.' 
       | '.select.' 
       | '.set.' 
       | '.step.' 
       | '.sub.' 
       | '.then.' 
       | '.to.' 
       | '.true.' 
       | '.until.' 
       | '.wend.' 
       | '.while.' 
       | '.with.' 
       | '.xor.'; 

/*===== rules =====*/ 
new_line: (SEMI_COLON | NEW_LINE_CHARACTER)+; 
program: new_line? global_stmt_list; 

/*===== rules: declarations =====*/ 
class_decl: 'class' extended_id new_line member_decl_list 'end' 'class' new_line; 
member_decl_list: member_decl*; 
member_decl: field_decl | var_decl | const_decl | sub_decl | function_decl | property_decl; 
field_decl: 
    'private' field_name other_vars_opt new_line 
| 'public' field_name other_vars_opt new_line; 
field_name: field_id '(' array_rank_list ')' | field_id; 
field_id: id | 'default' | 'erase' | 'error' | 'explicit' | 'step'; 
var_decl: 'dim' var_name other_vars_opt new_line; 
var_name: extended_id '(' array_rank_list ')' | extended_id; 
other_vars_opt: (',' var_name other_vars_opt)?; 
array_rank_list: (int_literal ',' array_rank_list | int_literal)?; 
const_decl: access_modifier_opt 'const' const_list new_line; 
const_list: extended_id '=' const_expr_def ',' const_list | extended_id '=' const_expr_def; 
const_expr_def: '(' const_expr_def ')' 
| '-' const_expr_def 
| '+' const_expr_def 
| const_expr; 
sub_decl: 
    method_access_opt 'sub' extended_id method_arg_list new_line method_stmt_list 'end' 'sub' new_line 
| method_access_opt 'sub' extended_id method_arg_list inline_stmt 'end' 'sub' new_line; 
function_decl: 
    method_access_opt 'function' extended_id method_arg_list new_line method_stmt_list 'end' 'function' new_line 
| method_access_opt 'function' extended_id method_arg_list inline_stmt 'end' 'function' new_line; 
method_access_opt: 'public' 'default' | access_modifier_opt; 
access_modifier_opt: ('public' | 'private')?; 
method_arg_list: ('(' arg_list? ')')?; 
arg_list: arg (',' arg_list)?; 
arg: arg_modifier_opt extended_id ('(' ')')?; 
arg_modifier_opt: ('byval' | 'byref')?; 
property_decl: method_access_opt 'property' property_access_type extended_id method_arg_list new_line method_stmt_list 'end' 'property' new_line; 
property_access_type: 'get' | 'let' | 'set'; 

/*===== rules: statements =====*/ 
global_stmt: option_explicit | class_decl | field_decl | const_decl | sub_decl | function_decl | block_stmt; 
method_stmt: const_decl | block_stmt; 
block_stmt: 
    var_decl 
| redim_stmt 
| if_stmt 
| with_stmt 
| select_stmt 
| loop_stmt 
| for_stmt 
| inline_stmt new_line; 
inline_stmt: 
    assign_stmt 
| call_stmt 
| sub_call_stmt 
| error_stmt 
| exit_stmt 
| 'erase' extended_id; 
global_stmt_list: global_stmt_list global_stmt | global_stmt; 
method_stmt_list: method_stmt*; 
block_stmt_list: block_stmt*; 
option_explicit: 'option' 'explicit' new_line; 
error_stmt: 'on' 'error' 'resume' 'next' | 'on' 'error' 'goto' int_literal; 
exit_stmt: 'exit' 'do' | 'exit' 'for' | 'exit' 'function' | 'exit' 'property' | 'exit' 'sub'; 
assign_stmt: 
     left_expr '=' expr 
| 'set' left_expr '=' expr 
| 'set' left_expr '=' 'new' left_expr; 
sub_call_stmt:    qualified_id sub_safe_expr? comma_expr_list 
         | qualified_id sub_safe_expr? 
         | qualified_id '(' expr ')' comma_expr_list 
         | qualified_id '(' expr ')' 
         | qualified_id '(' ')' 
         | qualified_id index_or_params_list '.' left_expr_tail sub_safe_expr? comma_expr_list 
         | qualified_id index_or_params_list_dot left_expr_tail sub_safe_expr? comma_expr_list 
         | qualified_id index_or_params_list '.' left_expr_tail sub_safe_expr? 
         | qualified_id index_or_params_list_dot left_expr_tail sub_safe_expr?; 


call_stmt: 'call' left_expr; 

left_expr: qualified_id index_or_params_list '.' left_expr_tail 
         | qualified_id index_or_params_list_dot left_expr_tail 
         | qualified_id index_or_params_list 
         | qualified_id 
         | safe_keyword_id; 

left_expr_tail: qualified_id_tail index_or_params_list '.' left_expr_tail 
         | qualified_id_tail index_or_params_list_dot left_expr_tail 
         | qualified_id_tail index_or_params_list 
         | qualified_id_tail; 

qualified_id: iddot qualified_id_tail 
         | dot_iddot qualified_id_tail 
         | id 
         | dot_id; 

qualified_id_tail: iddot qualified_id_tail 
         | id 
         | keyword_id; 

keyword_id: safe_keyword_id 
         | 'and' 
         | 'byref' 
         | 'byval' 
         | 'call' 
         | 'case' 
         | 'class' 
         | 'const' 
         | 'dim' 
         | 'do' 
         | 'each' 
         | 'else' 
         | 'elseif' 
         | 'empty' 
         | 'end' 
         | 'eqv' 
         | 'exit' 
         | 'false' 
         | 'for' 
         | 'function' 
         | 'get' 
         | 'goto' 
         | 'if' 
         | 'imp' 
         | 'in' 
         | 'is' 
         | 'let' 
         | 'loop' 
         | 'mod' 
         | 'new' 
         | 'next' 
         | 'not' 
         | 'nothing' 
         | 'null' 
         | 'on' 
         | 'option' 
         | 'or' 
         | 'preserve' 
         | 'private' 
         | 'public' 
         | 'redim' 
         | 'resume' 
         | 'select' 
         | 'set' 
         | 'sub' 
         | 'then' 
         | 'to' 
         | 'true' 
         | 'until' 
         | 'wend' 
         | 'while' 
         | 'with' 
         | 'xor'; 

safe_keyword_id: 'default' 
         | 'erase' 
         | 'error' 
         | 'explicit' 
         | 'property' 
         | 'step'; 

extended_id: safe_keyword_id 
         | id; 

index_or_params_list: index_or_params index_or_params_list 
         | index_or_params; 

index_or_params: '(' expr comma_expr_list ')' 
         | '(' comma_expr_list ')' 
         | '(' expr ')' 
         | '(' ')'; 

index_or_params_list_dot: index_or_params index_or_params_list_dot 
         | index_or_params_dot; 

index_or_params_dot: '(' expr comma_expr_list ').' 
         | '(' comma_expr_list ').' 
         | '(' expr ').' 
         | '(' ').'; 

comma_expr_list: ',' expr comma_expr_list 
         | ',' comma_expr_list 
         | ',' expr 
         | ','; 

/* redim statement */ 

redim_stmt: 'redim' redim_decl_list new_line 
         | 'redim' 'preserve' redim_decl_list new_line; 

redim_decl_list: redim_decl ',' redim_decl_list 
         | redim_decl; 

redim_decl: extended_id '(' expr_list ')'; 

/* if statement */ 

if_stmt: 'if' expr 'then' new_line block_stmt_list else_stmt_list 'end' 'if' new_line 
         | 'if' expr 'then' inline_stmt else_opt end_if_opt new_line; 

else_stmt_list: ('elseif' expr 'then' new_line block_stmt_list else_stmt_list 
         | 'elseif' expr 'then' inline_stmt new_line else_stmt_list 
         | 'else' inline_stmt new_line 
         | 'else' new_line block_stmt_list)?; 

else_opt: ('else' inline_stmt)?; 
end_if_opt : ('end' 'if')?; 

/* with statement */ 

with_stmt: 'with' expr new_line block_stmt_list 'end' 'with' new_line; 

/* loop statement */ 

loop_stmt: 'do' loop_type expr new_line block_stmt_list 'loop' new_line 
         | 'do' new_line block_stmt_list 'loop' loop_type expr new_line 
         | 'do' new_line block_stmt_list 'loop' new_line 
         | 'while' expr new_line block_stmt_list 'wend' new_line; 

loop_type: 'while' | 'until'; 

/* for statement */ 

for_stmt: 'for' extended_id '=' expr 'to' expr step_opt new_line block_stmt_list 'next' new_line 
         | 'for' 'each' extended_id 'in' expr new_line block_stmt_list 'next' new_line; 

step_opt: ('step' expr)?; 

/* select statement */ 

select_stmt: 'select' 'case' expr new_line cast_stmt_list 'end' 'select' new_line; 

cast_stmt_list: ('case' expr_list nl_opt block_stmt_list cast_stmt_list 
         | 'case' 'else' nl_opt block_stmt_list)?; 

nl_opt: new_line?; 

expr_list: expr ',' expr_list | expr; 

/*===== rules: expressions =====*/ 

sub_safe_expr: sub_safe_imp_expr; 

sub_safe_imp_expr: sub_safe_imp_expr 'imp' eqv_expr | sub_safe_eqv_expr; 

sub_safe_eqv_expr: sub_safe_eqv_expr 'eqv' xor_expr 
         | sub_safe_xor_expr; 

sub_safe_xor_expr: sub_safe_xor_expr 'xor' or_expr 
         | sub_safe_or_expr; 

sub_safe_or_expr: sub_safe_or_expr 'or' and_expr 
         | sub_safe_and_expr; 

sub_safe_and_expr  : sub_safe_and_expr 'and' not_expr 
         | sub_safe_not_expr; 

sub_safe_not_expr  : 'not' not_expr 
         | sub_safe_compare_expr; 



sub_safe_compare_expr : sub_safe_compare_expr 'is' concat_expr 
         | sub_safe_compare_expr 'is' 'not' concat_expr 
         | sub_safe_compare_expr '>=' concat_expr 
         | sub_safe_compare_expr '=>' concat_expr 
         | sub_safe_compare_expr '<=' concat_expr 
         | sub_safe_compare_expr '=<' concat_expr 
         | sub_safe_compare_expr '>' concat_expr 
         | sub_safe_compare_expr '<' concat_expr 
         | sub_safe_compare_expr '<>' concat_expr 
         | sub_safe_compare_expr '=' concat_expr 
         | sub_safe_concat_expr; 

sub_safe_concat_expr : sub_safe_concat_expr '&' add_expr 
         | sub_safe_add_expr; 

sub_safe_add_expr  : sub_safe_add_expr '+' mod_expr 
         | sub_safe_add_expr '-' mod_expr 
         | sub_safe_mod_expr; 

sub_safe_mod_expr  : sub_safe_mod_expr 'mod' int_div_expr 
         | sub_safe_int_div_expr; 

sub_safe_int_div_expr : sub_safe_int_div_expr '\\' mult_expr 
         | sub_safe_mult_expr; 

sub_safe_mult_expr  : sub_safe_mult_expr '*' unary_expr 
         | sub_safe_mult_expr '/' unary_expr 
         | sub_safe_unary_expr; 

sub_safe_unary_expr  : '-' unary_expr 
         | '+' unary_expr 
         | sub_safe_exp_expr; 

sub_safe_exp_expr  : sub_safe_value '^' exp_expr 
         | sub_safe_value; 

sub_safe_value   : const_expr 
         | left_expr 
         | '(' expr ')'; 

expr     : imp_expr; 

imp_expr    : imp_expr 'imp' eqv_expr 
         | eqv_expr; 

eqv_expr    : eqv_expr 'eqv' xor_expr 
         | xor_expr; 

xor_expr    : xor_expr 'xor' or_expr 
         | or_expr; 

or_expr    : or_expr 'or' and_expr 
         | and_expr; 

and_expr    : and_expr 'and' not_expr 
         | not_expr; 

not_expr    : 'not' not_expr 
         | compare_expr; 

compare_expr   : compare_expr 'is' concat_expr 
         | compare_expr 'is' 'not' concat_expr 
         | compare_expr '>=' concat_expr 
         | compare_expr '=>' concat_expr 
         | compare_expr '<=' concat_expr 
         | compare_expr '=<' concat_expr 
         | compare_expr '>' concat_expr 
         | compare_expr '<' concat_expr 
         | compare_expr '<>' concat_expr 
         | compare_expr '=' concat_expr 
         | concat_expr; 

concat_expr   : concat_expr '&' add_expr 
         | add_expr; 

add_expr    : add_expr '+' mod_expr 
         | add_expr '-' mod_expr 
         | mod_expr; 

mod_expr    : mod_expr 'mod' int_div_expr 
         | int_div_expr; 

int_div_expr   : int_div_expr '\\' mult_expr 
         | mult_expr; 

mult_expr    : mult_expr '*' unary_expr 
         | mult_expr '/' unary_expr 
         | unary_expr; 

unary_expr   : '-' unary_expr 
         | '+' unary_expr 
         | exp_expr; 

exp_expr    : value '^' exp_expr 
         | value; 

value    : const_expr 
         | left_expr 
         | '(' expr ')'; 

const_expr   : bool_literal 
         | int_literal 
         | float_literal 
         | string_literal 
         | nothing; 

bool_literal   : 'true' 
         | 'false'; 

int_literal   : DIGIT+; 

nothing    : 'nothing' 
         | 'null' 
         | 'empty'; 

回答

0

您的语法在解析器部分定义了“文字”。请注意,ANTLR将每个小写规则视为解析器规则(大写规则是词法分析规则)。

你的小部分问题可以解决这样的:

FLOAT_LITERAL 
    : DIGIT* '.' DIGIT+ ('e' PLUS_OR_MINUS? DIGIT+)? 
    | DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+; 
LETTER 
    : [a-z]; 

的ANTLR词法分析器喜欢的最长匹配规则(如果两个规则有冲突时,喜欢第一个定义)。这两个规则完全不相关,所以定义的顺序是不相关的(在基本规则之上定义更复杂的规则时,它的可读性更高)。

您可以通过大写字符延长第二个定义:

LETTER 
    : [a-zA-Z]; 

为了解决你的语法的整体问题,你会需要你的语法完全重写。 terminals部分的大部分规则应该是词法规则。然而终端部分看起来过于充实,所以也可能有些规则是解决不存在的解析器规则的解决方法。

+0

这很有道理:)我不确定是否可以使用其他词法分析规则组成的词法分析规则,除非它们是碎片。谢谢。 – user1694806