2015-09-30 52 views
1

有2个日志文件:log Alog B忽略日志文件中时间戳的Pythonic脚本

log A 

2015-07-12 08:50:33,904 [Collection-3]INFO app -Executing Scheduled job: System: choppa1 

2015-07-12 09:56:45,060 [Collection-3] INFO app - Executing Scheduled job: System: choppa1 

2015-07-12 10:00:00,001 [Analytics_Worker-1] INFO app - Trigger for job AnBuildAuthorizationJob was fired. 

2015-07-12 11:00:00,007 [Analytics_Worker-1] INFO app - Starting the AnBuildAuthorizationJob job. 



log B 

2014-07-12 09:50:33,904 [Collection-3] INFO app - Executing Scheduled job: System: choppa1 

2014-07-12 09:56:45,060 [Collection-3] INFO app - Executing Scheduled job: System: choppa1 

2014-07-12 10:00:00,001 [Analytics_Worker-1] INFO app - Trigger for job AnBuildAuthorizationJob was fired. 

2014-07-12 10:00:00,007 [Analytics_Worker-1] INFO app - Starting the AnBuildAuthorizationJob job. 

2个日志文件具有相同的内容,但时间戳是不同的。我需要通过忽略时间戳来比较两个文件,即比较两个文件的每一行,即使它们具有不同的时间戳,也不应报告任何差异。我为此编写了以下python脚本:

#!/usr/bin/python 
import re 
import difflib 

program = open("log1.txt", "r") 
program_contents = program.readlines() 
program.close() 

new_contents = [] 

pat = re.compile("^[^0-9]") 

for line in program_contents: 
if re.search(pat, line): 
    new_contents.append(line) 

program = open("log2.txt", "r") 
program_contents1 = program.readlines() 
program.close() 

new_contents1 = [] 

pat = re.compile("^[^0-9]") 

for line in program_contents1: 
if re.search(pat, line): 
    new_contents1.append(line) 

diff=difflib.ndiff(new_contents,new_contents1) 
print(''.join(diff)) 

是否有更有效的方式来编写上述脚本?而且,只有时间戳记在行首时,上面的脚本才起作用。我想写一个python脚本,即使时间戳在行中的某个位置,它也应该可以工作。任何人都可以请帮助我如何做到这一点?

+0

'我想写一个Python脚本,如果时间戳是这是一个在要求相当飞跃line.'中间的地方,甚至应该工作。在这种情况下我们可以假设什么?时间戳是否有固定的格式?文本中是否有任何可能看起来像时间戳的东西? – nhahtdh

回答

0
I would change pat = re.compile("^[^0-9]") 

      to pat = re.compile("\d{4}-d{2}-d{2} 

而且最好是打开文件

    with open(filename) as f: 

这样Python会自动关闭文件给你,不需要关闭(F)语句。

+0

非常感谢... :)它的作品! :) – shruthi

0

下面是从文件开头消除时间戳的小脚本。

program = open("log1.txt", "r") 
program_contents = program.readlines() 
program.close() 

program = open("log2.txt", "r") 
program_contents1 = program.readlines() 
program.close() 

for i in range(0,len(program_contents1)): 
    if program_contents[i] == '\n': 
     continue 
    if program_contents[i][19:] == program_contents1[i][19:]: 
     print("Matches") 
相关问题