2013-04-16 139 views
0

我使用多个前/后导码多次传输消息。我希望能够提取两个有效的前/后邮件之间的消息。我CURENT代码是Python下一个子字符串搜索

print(msgfile[msgfile.find(preamble) + len(preamble):msgfile.find(postamble, msgfile.find(preamble))]) 

的问题是,如果后同步已损坏,将打印的第一个有效的前导和下一个有效同步码之间的所有数据。一个例子收到的文本文件将是:

garbagePREAMBLEmessagePOSTcMBLEgarbage 
garbagePRdAMBLEmessagePOSTAMBLEgarbage 
garbagePREAMBLEmessagePOSTAMBLEgarbage 

,它将打印

messagePOSTcMBLEgarbage 
garbagePRdEAMBLEmessage 

,但我真的希望它打印是从第三行的消息,因为它具有一个有效的前/后缓行。所以我想我想要的是能够从子字符串的下一个实例中查找和索引。是否有捷径可寻?

编辑:我不希望我的数据是在很好的离散线。我只是格式化这种方式,因此会更容易看到

回答

0

过程中它逐行:

>>> test = "garbagePREAMBLEmessagePOSTcMBLEgarbage\n" 
>>> test += "garbagePRdAMBLEmessagePOSTAMBLEgarbage\n" 
>>> test += "garbagePREAMBLEmessagePOSTAMBLEgarbage\n" 
>>> for line in test.splitlines(): 
     if line.find(preamble) != -1 and line.find(postamble) != -1: 
      print(line[line.find(preamble) + len(preamble):line.find(postamble)]) 
0
import re 

lines = ["garbagePREAMBLEmessagePOSTcMBLEgarbage", 
     "garbagePRdAMBLEmessagePOSTAMBLEgarbage", 
     "garbagePREAMBLEmessagePOSTAMBLEgarbage"] 

# you can use regex 
my_regex = re.compile("garbagePREAMBLE(.*?)POSTAMBLEgarbage") 

# get the match found between the preambles and print it 
for line in lines: 
    found = re.match(my_regex,line) 
    # if there is a match print it 
    if found: 
     print(found.group(1)) 

# you can use string slicing 
def validate(pre, post, message): 
    for line in lines: 
     # method would break on a string smaller than both preambles 
     if len(line) < len(pre) + len(post): 
      print("error line is too small") 

     # see if the message fits the pattern 
     if line[:len(pre)] == pre and line[-len(post):] == post: 
      # print message 
      print(line[len(pre):-len(post)]) 

validate("garbagePREAMBLE","POSTAMBLEgarbage", lines) 
0

是单线条的所有消息? 然后你可以使用正则表达式来确定与有效的预处理和后同步线:

input_file = open(yourfilename) 
import re 
pat = re.compile('PREAMBLE(.+)POSTAMBLE') 
messages = [pat.search(line).group(1) for line in input_file 
      if pat.search(line)] 

print messages 
+0

这非常适用于漂亮的离散线,但我不希望数据在所有被格式化。我只是为了便于观看而这样做。 – tdfoster

+0

你有什么结构的消息?最大长度,限制字符集,什么? –

相关问题