2011-02-03 232 views
5

所以我的问题是这样的,我有一个看起来像这样的文件:解析字符串

[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1 

这当然会转化为

' This is an example file!' 

我正在寻找一种方法来解析将原始内容放入最终内容中,以便[BACKSPACE]将删除最后一个字符(包含空格),并且多个后退将删除多个字符。 [SHIFT]对我来说并不重要。感谢所有的帮助!

+0

是后退键和[SHIFT]你需要担心的唯一标记? – inspectorG4dget 2011-02-03 03:12:51

回答

1

这是一种方式,但它感觉有点ha。。可能有更好的方法。

def process_backspaces(input, token='[BACKSPACE]'): 
    """Delete character before an occurence of "token" in a string.""" 
    output = '' 
    for item in (input+' ').split(token): 
     output += item 
     output = output[:-1] 
    return output 

def process_shifts(input, token='[SHIFT]'): 
    """Replace characters after an occurence of "token" with their uppecase 
    equivalent. (Doesn't turn "1" into "!" or "2" into "@", however!).""" 
    output = '' 
    for item in (' '+input).split(token): 
     output += item[0].upper() + item[1:] 
    return output 

test_string = '[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1' 
print process_backspaces(process_shifts(test_string)) 
0

看来你可以使用正则表达式搜索(某事),后退键,并没有取代它......

re.sub('.?\[BACKSPACE\]', '', YourString.replace('[SHIFT]', '')) 

不知道你的意思“的多个空格删除多个字符” 。

+1

-1这将如何工作“blah [BACKSPACE] [BACKSPACE] [BACKSPACE] arf”? – payne 2011-02-03 03:12:45

+0

它应该返回'barf' – 2011-02-03 03:18:48

+0

但它需要删除一个空格之前的退格以及'[BACKSPACE]'itslef – 2011-02-03 03:20:07

1

如果你不关心的变化,只是去掉它们,负载

(defun apply-bspace() 
    (interactive) 
    (let ((result (search-forward "[BACKSPACE]"))) 
    (backward-delete-char 12) 
    (when result (apply-bspace)))) 

,打M-x apply-bspace在查看文件。它是Elisp,不是Python,但它符合您的初始要求“something I can download for free to a PC”。

编辑:如果您想将其应用于数字,则Shift更复杂(以便​​=>@[SHIFT]3 =>#等)。在字母上工作的天真的方式是

(defun apply-shift() 
    (interactive) 
    (let ((result (search-forward "[SHIFT]"))) 
    (backward-delete-char 7) 
    (upcase-region (point) (+ 1 (point))) 
    (when result (apply-shift)))) 
0

您需要阅读输入,提取令牌,识别它们,并给他们一个意思。

这是我会怎么做:

# -*- coding: utf-8 -*- 

import re 

upper_value = { 
    1: '!', 2:'"', 
} 

tokenizer = re.compile(r'(\[.*?\]|.)') 
origin = "[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1" 
result = "" 

shift = False 

for token in tokenizer.findall(origin): 
    if not token.startswith("["): 
     if(shift): 
      shift = False 
      try: 
       token = upper_value[int(token)] 
      except ValueError: 
       token = token.upper() 

     result = result + token 
    else: 
     if(token == "[SHIFT]"): 
      shift = True 
     elif(token == "[BACKSPACE]"): 
      result = result[0:-1] 

这不是最快的,无论是优雅的解决方案,但我认为这是一个良好的开端。

希望它可以帮助:-)

1

这不正是你想要什么:

def shift(s): 
    LOWER = '`1234567890-=[];\'\,./' 
    UPPER = '[email protected]#$%^&*()_+{}:"|<>?' 

    if s.isalpha(): 
     return s.upper() 
    else: 
     return UPPER[LOWER.index(s)] 

def parse(input): 
    input = input.split("[BACKSPACE]") 
    answer = '' 
    i = 0 
    while i<len(input): 
     s = input[i] 
     if not s: 
      pass 
     elif i+1<len(input) and not input[i+1]: 
      s = s[:-1] 
     else: 
      answer += s 
      i += 1 
      continue 
     answer += s[:-1] 
     i += 1 

    return ''.join(shift(i[0])+i[1:] for i in answer.split("[SHIFT]") if i) 

>>> print parse("[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1") 
>>> This is an example file!