我为一个已经过时的文本编辑器的脚本语言实现了一个解释器,并且我在使一个词法分析器正常工作时遇到了一些麻烦。解析这种脚本语言的最有效的方法
这里是语言的问题部分的例子:
T
L /LOCATE ME/
C /LOCATE ME/CHANGED ME/ * *
C ;CHANGED ME;CHANGED ME AGAIN; 1 *
的/
人物似乎引用字符串,也充当在sed
型语法C
(CHANGE
)命令分隔符,虽然它允许任何字符作为分隔符。
我可能实现了大约一半最常用的命令,直到现在才使用parse_tokens(line.split())
。这是快速和肮脏的,但它的工作出人意料地好。
为了避免写我自己的词法分析器,我试过shlex
。
它工作得很好,除了CHANGE
情况:
import shlex
def shlex_test(cmd_str):
lex = shlex.shlex(cmd_str)
lex.quotes = '/'
return list(lex)
print(shlex_test('L /spaced string/'))
# OK! gives: ['L', '/spaced string/']
print(shlex_test('C /spaced string/another string/ * *'))
# gives : ['C', '/spaced string/', 'another', 'string/', '*', '*']
# desired : any format that doesn't split on a space between /'s
print(shlex_test('C ;a b;b a;'))
# gives : ['C', ';', 'b', 'a', ';', 'a', 'b', ';']
# desired : same format as CHANGE command above
任何人都知道一个简单的方法来做到这一点(与shlex
或其他)?
编辑:
如果有帮助,这里是在帮助文件中给出的CHANGE
命令语法:
'''
C [/stg1/stg2/ [n|n m]]
The CHANGE command replaces the m-th occurrence of "stg1" with "stg2"
for the next n lines. The default value for m and n is 1.'''
的同样困难来标记X
和Y
命令:
'''
X [/command/[command/[...]]n]
Y [/command/[command/[...]]n]
The X and Y commands allow the execution of several commands contained
in one command. To define an X or Y "command string", enter X (or Y)
followed by a space, then individual commands, each separated by a
delimiter (e.g. a period "."). An unlimited number of commands may be
placed in the X or Y command string. Once the command string has been
defined, entering X (or Y) followed optionally by a count n will execute
the defined command string n times. If n is not specified, it will
default to 1.'''
您有权访问语言定义吗?如果是这样,相关部分的引用可能对我们所有人都有用。 – Marcin 2012-07-19 17:03:28
@Marcin我从帮助文件中添加了一些相关信息,这是我拥有的所有文档。 – 2012-07-19 17:24:28
我不知道'shlex',但我认为'regex' [(re)](http://docs.python.org/library/re.html)也是有用的。 – machaku 2012-07-19 17:29:32