我正在尝试为读取简单语言的编译器创建扫描程序。我创建了一个名为程序测试文件，其中包含：Python3 - 为编译器创建扫描程序并在测试时出错

z := 2; 
if z < 3 then 
    z := 1 
end

要运行的程序，我使用的终端，然后运行命令行：

python3 scanner.py program tokens

我想输出将投入文本文件标记，但是当我这样做时没有出现。在运行期间，程序运行但不执行任何操作。我试图把<>围绕程序，但我得到了一个ValueError：需要多个值才能打开。

我的代码如下：

如果

import re 
import sys 

class Scanner: 
    '''The interface comprises the methods lookahead and consume. 
     Other methods should not be called from outside of this class.''' 

def __init__(self, input_file): 
    '''Reads the whole input_file to input_string, which remains constant. 
    current_char_index counts how many characters of input_string have 
    been consumed. 
    current_token holds the most recently found token and the 
    corresponding part of input_string.''' 

    # source code of the program to be compiled 
    self.input_string = input_file.read() 

    # index where the unprocessed part of input_string starts 
    self.current_char_index = 0 

    # a pair (most recently read token, matched substring of input_string) 
    self.current_token = self.get_token() 

def skip_white_space(self): 
    '''Consumes all characters in input_string up to the next 
     non-white-space character.''' 
    if (self.current_char_index >= len(self.input_string) - 1): 
     return 

    while self.input_string[self.current_char_index].isspace(): 
     self.current_char_index += 1 

def get_token(self): 
    '''Returns the next token and the part of input_string it matched. 
     The returned token is None if there is no next token. 
     The characters up to the end of the token are consumed.''' 
    self.skip_white_space() 
    # find the longest prefix of input_string that matches a token 
    token, longest = None, '' 
    for (t, r) in Token.token_regexp: 
     match = re.match(r, self.input_string[self.current_char_index:]) 
     if match and match.end() > len(longest): 
      token, longest = t, match.group() 
    # consume the token by moving the index to the end of the matched part 
    self.current_char_index += len(longest) 
    return (token, longest) 

def lookahead(self): 
    '''Returns the next token without consuming it. 
     Returns None if there is no next token.''' 
    return self.current_token[0] 

def consume(self, *tokens): 
    '''Returns the next token and consumes it, if it is in tokens. 
     Raises an exception otherwise. 
     If the token is a number or an identifier, its value is returned 
     instead of the token.''' 
    current = self.current_token 

    if (len(self.input_string[self.current_char_index:]) == 0): 
     self.current_token = (None, '')   # catches the end-of-file errors so lookahead returns none. 
    else: 
     self.current_token = self.get_token() # otherwise we consume the token 

    if current[0] in tokens:   # tokens could be a single token, or it could be group of tokens. 
     if current[0] is Token.ID or current[0] is Token.NUM:  # if token is ID or NUM 
      return current[1]     # return the value of the ID or NUM 
     else:         # otherwise 
      return current[0]     # return the token 
    else:          # if current_token is not in tokens 
     raise Exception('non-token detected') # raise non-token error 

class Token: 
# The following enumerates all tokens. 
DO = 'DO' 
ELSE = 'ELSE' 
READ = 'READ' 
WRITE = 'WRITE' 
END = 'END' 
IF = 'IF' 
THEN = 'THEN' 
WHILE = 'WHILE' 
SEM = 'SEM' 
BEC = 'BEC' 
LESS = 'LESS' 
EQ = 'EQ' 
GRTR = 'GRTR' 
LEQ = 'LEQ' 
NEQ = 'NEQ' 
GEQ = 'GEQ' 
ADD = 'ADD' 
SUB = 'SUB' 
MUL = 'MUL' 
DIV = 'DIV' 
LPAR = 'LPAR' 
RPAR = 'RPAR' 
NUM = 'NUM' 
ID = 'ID' 

# The following list gives the regular expression to match a token. 
# The order in the list matters for mimicking Flex behaviour. 
# Longer matches are preferred over shorter ones. 
# For same-length matches, the first in the list is preferred. 
token_regexp = [ 
    (DO, 'do'), 
    (ELSE, 'else'), 
    (READ, 'read'), 
    (WRITE, 'write'), 
    (END, 'end'), 
    (IF, 'if'), 
    (THEN, 'then'), 
    (WHILE, 'while'), 
    (SEM, ';'), 
    (BEC, ':='), 
    (LESS, '<'), 
    (EQ, '='), 
    (NEQ, '!='), 
    (GRTR, '>'), 
    (LEQ, '<='), 
    (GEQ, '>='), 
    (ADD, '[+]'), # + is special in regular expressions 
    (SUB, '-'), 
    (MUL, '[*]'), 
    (DIV, '[/]'), 
    (LPAR, '[(]'), # (is special in regular expressions 
    (RPAR, '[)]'), #) is special in regular expressions 
    (ID, '[a-z]+'), 
    (NUM, '[0-9]+'), 
] 

def indent(s, level): 
    return ' '*level + s + '\n' 

# Initialise scanner. 

scanner = Scanner(sys.stdin) 

# Show all tokens in the input. 

token = scanner.lookahead() 
test = '' 

while token != None: 
if token in [Token.NUM, Token.ID]: 
    token, value = scanner.consume(token) 
    print(token, value) 
else: 
    print(scanner.consume(token)) 
token = scanner.lookahead()

对不起，这是解释不清。对出现问题的任何帮助都很棒。谢谢。

来源

2015-05-09 RCR1994

我只是添加一个if和elif语句来_consume_打印NUM和ID。例如，_If current [0]是Token.ID_，然后_return“ID”+ current [1] _。 – RCR1994

解决方案1a

我想通了它为什么不打印到文件标记。我需要改变我的测试代码，这个

while token != None: 
print(scanner.consume(token)) 
token = scanner.lookahead()

现在唯一的问题是，当它是一个ID或NUM我也看不懂，只打印出标识或数量没有说明它是。眼下，它打印出这一点：

z
BEC
2
SEM
IF
z
LESS
3
THEN
z
BEC
1
END

，我需要它来打印出此

NUM z
BEC
ID 2
SEM
IF
ID z
LESS
NUM 3
THEN
ID z
BEC
NUM 1
END

我想加入的，如果其中规定，如果它是一个NUM，然后打印NUM其次声明由令牌，同样如果它是一个ID。

解1B

我简单地添加一个if和elif的语句来消耗打印NUM和ID。例如，如果current [0]是Token.ID，则返回“ID”+ current [1]。

来源

2015-05-09 04:00:31 RCR1994

我还没有改变什么，但空白和消费，并具有即时的困难得到它的运行...

高清skip_white_space（个体经营）： “”'消耗在input_string到下一个非白人的所有字符。空白文字“”“

while self.input_string[self.current_char_index] == '\s': 
     self.current_char_index += 1

DEF消耗（自，*记号）： ‘’”返回一个令牌，消耗它，如果是在令牌。否则引发异常。如果令牌是一个数字或一个标识符，而不仅仅是令牌但一对令牌的并且它的值被返回。“””
电流= self.current_token

if current[0] in tokens:   
     if current[0] in Token.ID:  
      return 'ID' + current[1] 
     elif current[0] in Token.NUM: 
      return 'NUM' + current[1] 
     else: 
      return current[0] 
    else:          
     raise Exception('Error in compiling non-token(not apart of token list)')

...我特别是遇到麻烦试图让python3 scanner.py <程序>令牌工作，任何指导将帮助我很多，thanx

来源

2015-05-10 01:11:11 David

对不起，我不能早点回来。我意识到任务已经结束，但如果可以的话，我会尽力帮助。您是否在终端或IDE中运行'python3 scanner.py 令牌？另外，您是否创建了一个名为program的文件并添加了示例代码？如果您没有文件名称程序，它将无法输出任何内容或者可能会给您一条错误消息。 – RCR1994

Python3 - 为编译器创建扫描程序并在测试时出错

回答

解决方案1a

解1B

相关问题