2013-03-14 165 views
1

我有一个包含数十万行的日志文件。循环python中的循环

我正在通过这些行循环查找具有某些特定文本的任何行,例如:!!event!!
然后,一旦找到!!event!!行,我需要继续循环此行!!event!!,直到找到接下来的3行包含自己的特定文本('flag1', 'flag2', and 'flag3')
一旦我找到第三行('flag3'),然后我想继续循环下一行!!event!!行并重复前一个过程,直到没有更多事件。

有没有人有建议我构建我的代码来完成这个?

例如:

f = open('samplefile.log','r') 
for line in f: 
    if '!!event!!' in line: 
      L0 = line 
     #then get the lines after L0 containing: 'flag1', 'flag2', and 'flag3' 
     # below is a sample log file 
     #I am not sure how to accomplish this 
     #(I am thinking a loop within the current loop) 
     #I know the following is incorrect, but the 
     intended result would be able to yield something like this: 
      if "flag1" in line: 
       L1 = line.split() 
      if "flag2" in line: 
       L2 = line.split() 
      if "flag3" in line: 
       L3 = line.split() 
print 'Event and flag times: ', L0[0], L1[0], L2[0], L3[0] 

samplefile.log

8:41:05 asdfa 32423 
8:41:06 dasd 23423 
8:41:07 dfsd 342342 
8:41:08 !!event!! 23423 
8:41:09 asdfs 2342 
8:41:10 asdfas flag1 
8:41:11 asda 42342 
8:41:12 sdfs flag2 
8:41:13 sdafsd 2342 
8:41:14 asda 3443 
8:41:15 sdfs 2323 
8:41:16 sdafsd flag3 
8:41:17 asda 2342 
8:41:18 sdfs 3443 
8:41:19 sdafsd 2342 
8:41:20 asda 3443 
8:41:21 sdfs 4544 
8:41:22 !!event!! 5645 
8:41:23 sdfs flag1 
8:41:24 sadfs flag2 
8:41:25 dsadf 32423 
8:41:26 sdfa 23423 
8:41:27 sdfsa flag3 
8:41:28 sadfa 23423 
8:41:29 sdfas 2342 
8:41:30 dfsdf 2342 

从这个示例代码应打印:

Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
+1

建议:将行馈送到状态类似于find_event,find_flag1等的FSM(有限状态机)。 – Ber 2013-03-14 15:28:43

+0

您应该使用正则表达式来执行此操作。如果你向我展示一些示例输入以及你想要做什么,我可以教你如何。 – 2013-03-14 15:32:22

回答

3

当然,你可以继续消耗在一个内部循环的文件,然后跳出来当你遇到Flag3相同的,并且外环将恢复:

for line in f: 
    if '!!event!!' in line: 
     L0 = line.split() 
     for line in f: 
      if "flag1" in line: 
       L1 = line.split() 
      elif "flag2" in line: 
       L2 = line.split() 
      elif "flag3" in line: 
       L3 = line.split() 
       break    # continue outer loop 
     print 'Event and flag times: ', L0[0], L1[0], L2[0], L3[0] 

# Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
# Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
+0

谢谢你和所有迄今已回复的!这是我看到的第一个答复,它非常简单而且有效。我仍然会看看其他回复,看看是否还有更多关于此主题的信息 – teachamantofish 2013-03-14 16:26:33

0

在这里你去:

with open("in6.txt") as f: 
    flag = False 
    c = 0 
    d = [] 
    data = [] 
    for line in f: 
     if flag: 
      if "flag1" in line or "flag2" in line: 
       data.append(line.split()[0]) 
      elif "flag3" in line: 
       data.append(line.split()[0]) 
       flag = False 
       d.append(data) 

      continue 
     if "!!event!!" in line: 
      flag = True 
      data = [] 
      c = 0 
      data.append(line.split()[0]) 

for l in d: 
    print "Event and flag times: ", l[0], l[1], l[2], l[3] 

输出

>>> 
Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
+0

你永远不会在行中检测正确的'flag'文本。这里假设它是接下来的3条线是不正确的。 – 2013-03-14 15:38:00

+0

@MartijnPieters谢谢,更新... – ATOzTOA 2013-03-14 15:43:10

0

保持一个标志来跟踪你在找什么:

with open('samplefile.log') as f: 
    events = [] 
    current_event = [] 
    for line in f: 
     if not current_event and '!!event!!' in line: 
      current_event.append(line.split()[0]) 
     else: 
      if 'flag1' in line or 'flag2' in line or 'flag3' in line: 
       current_event.append(line.split()[0]) 
       if 'flag3' in line: # could also be `if len(current_event) == 4:` 
        events.append(current_event) 
        current_event = [] 

for event in events: 
    print 'Event and flag times:', ' '.join(event) 

这里我用current_event作为国旗;通过将!!event!!行时间添加到它,它变得非空,我们开始寻找标志。

我将个人活动时间收集到events列表中,但您也可以在找到flag3行时打印活动数据。

输出:

Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
0

只是循环遍历每一行,那么当你发现!!event!!,开始寻找标志,一旦所有的标志被发现,继续...

喜欢的东西:

def get_time(line): 
    return [ i for i in line.split() if i != ''][0] 

data = [] 
index = -1 
look_for_flags = False 
for line in lines: 
    if '!!event!!' in line: 
     look_for_flags = True 
     data.append([get_time(line)]) 
     index += 1 
    elif look_for_flags: 
     if 'flag1' in line or 'flag2' in line or 'flag3' in line: 
      data[index].append(get_time(line)) 
print data 
0

执行此操作最明确的方法是使用generator function,这样可以避免保留任何状态。 每当你需要建立一个状态机(就像你在这里所做的那样),想想generator

import sys 

def find_target_lines(file_handle): 
    target = yield 
    for line in file_handle: 
     if target in line: 
      target = yield line 

f = open('samplefile.log','r') 
targets = ['!!event!!', 'flag1', 'flag2', 'flag3'] 

while True: 
    found = list() 
    finder = find_target_lines(f) 
    next(finder) 
    try: 
     for target in targets: 
      line = finder.send(target) 
      if line: 
       found.append(line) 
     print(found) 
    except StopIteration: 
     break