2013-10-15 44 views
1

我来分析电子邮件发送日志文件(得到一个消息ID SMTP回复),它看起来像这样:Python的分析日志文件与正则表达式

Nov 12 17:26:57 zeus postfix/smtpd[23992]: E859950021DB1: client=pegasus.os[172.20.19.62] 
Nov 12 17:26:57 zeus postfix/cleanup[23995]: E859950021DB1: message-id=a92de331-9242-4d2a-8f0e-9418eb7c
Nov 12 17:26:58 zeus postfix/qmgr[22359]: E859950021DB1: from=<[email protected]>, size=114324, nrcpt=1 (queue active) 
Nov 12 17:26:58 zeus postfix/smtp[24007]: certificate verification failed for mx.elutopia.it[62.149.128.160]:25: untrusted issuer /C=US/O=RTFM, Inc./OU=Widgets Division/CN=Test CA20010517 
Nov 12 17:26:58 zeus postfix/smtp[24007]: E859950021DB1: to=<[email protected]>, relay=mx.elutopia.it[62.149.128.160]:25, delay=0.89, delays=0.09/0/0.3/0.5, dsn=2.0.0, status=sent (250 2.0.0 d3Sx1m03q0ps1bK013Sxg4 mail accepted for delivery) 
Nov 12 17:26:58 zeus postfix/qmgr[22359]: E859950021DB1: removed 
Nov 12 17:27:00 zeus postfix/smtpd[23980]: connect from pegasus.os[172.20.19.62] 
Nov 12 17:27:00 zeus postfix/smtpd[23980]: setting up TLS connection from pegasus.os[172.20.19.62] 
Nov 12 17:27:00 zeus postfix/smtpd[23980]: Anonymous TLS connection established from pegasus.os[172.20.19.62]: TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits) 
Nov 12 17:27:00 zeus postfix/smtpd[23992]: disconnect from pegasus.os[172.20.19.62] 
Nov 12 17:27:00 zeus postfix/smtpd[23980]: 2C04150101DB2: client=pegasus.os[172.20.19.62] 
Nov 12 17:27:00 zeus postfix/cleanup[23994]: 2C04150101DB2: message-id=21e2f9d3-154a-3683-85d3-a7c52d429386 
Nov 12 17:27:00 zeus postfix/qmgr[22359]: 2C04150101DB2: from=<[email protected]>, size=53237, nrcpt=1 (queue active) 
Nov 12 17:27:00 zeus postfix/smtp[24006]: ABE7C50001D62: to=<[email protected]>, relay=relay3.telnew.it[195.36.1.102]:25, delay=4.9, delays=0.1/0/4/0.76, dsn=2.0.0, status=sent (250 2.0.0 r9EFQt0J009467 Message accepted for delivery) 
Nov 12 17:27:00 zeus postfix/qmgr[22359]: ABE7C50001D62: removed 
Nov 12 17:27:00 zeus postfix/smtp[23998]: 2C04150101DB2: to=<[email protected]>, relay=liberomx2.elgravo.ch[212.52.84.93]:25, delay=0.72, delays=0.07/0/0.3/0.35, dsn=2.0.0, status=sent (250 ok: Message 2040264602 accepted) 
Nov 12 17:27:00 zeus postfix/qmgr[22359]: 2C04150101DB2: removed 

目前,我得到一个消息-ID( UUID)从数据库(例如a92de331-9242-4d2a-8f0e-9418eb7c0123),然后运行通过日志文件我的代码:

log_id = re.search (']: (.+?): message-id='+message_id, text).group(1) 
sent_status = (re.search (']: '+log_id+'.*dsn=(.....)', text) 

随着消息的ID我找到LOG_ID,并与LOG_ID我可以找到SMTP回复答案。

这工作得很好,但更好的办法是,如果软件经过日志文件,得到消息的ID和答复代码和更新数据库即可。但我不确定,我该怎么做?该脚本必须每2分钟运行一次,并检查更新的日志文件。那么,我该如何保证它能记住它的位置,并且不会收到两次消息ID? 在此先感谢

+0

您可以存储您在数据库中某处读取的最后一个消息ID。 – Ashalynd

回答

0

使用字典来存储消息ID,使用一个单独的文件来存储上次离开日志文件中的字节数。

msgIDs = {} 
# get where you left off in the logfile during the last read: 
try: 
    with open('logfile_placemarker.txt', 'r') as f: 
     lastRead = int(f.read()) 
except IOError: 
    print("Can't find/read place marker file! Starting at 0") 
    lastRead = 0 

with open('logfile.log', 'r') as f: 
    f.seek(lastRead) 
    for line in f: 
     # ... 
     # Pick out msgIDs and response codes 
     # ... 
     if msgID in msgIDs: 
      print("uh oh, found the same msg id twice!!") 
     msgIDs[msgID] = responseCode 
    lastRead = f.tell() 

# Do whatever you need to do with the msgIDs you found: 
updateDB(msgIDs) 
# Store lastRead (where you left off in the logfile) in a file if you need to so it persists in the next run 
with open('logfile_placemarker.txt', 'w') as f: 
    f.write(str(lastRead))