2014-08-27 43 views
0

我有一个日志如下图所示:内部抓取件。*?正则表达式

事件:“[INIT] WinEvtLog:安全性:AUDIT_SUCCESS(528):安全性:管理员:AMAZON-D071A6F8:AMAZON-D071A6F8:成功登录:用户名:Administrator域:AMAZON-D071A6F8登录ID:(0x0,0x1054A66)登录类型:10登录过程:User32身份验证包:协商工作站名称:AMAZON-D071A6F8登录GUID: - 来电用户名:AMAZON-D071A6F8 $来电域:WORKGROUP来电者登录ID :(0x0,0x3E7)调用者进程ID:968转换服务: - 源网络地址:10.0.0.200源端口:60054 [END]“;

我捕捉到的日志与此正则表达式:

EVENT:\s\"\[INIT\](?P<log>.*?)\[END\]\"; 

我这样做是因为我想以后显示整个EVENT

(?P<log>)里面有件我也想抢。例如,

Source\sPort:\s(?P<src_port>\d+) 
Source\sNetwork\sAddress:\s(?P<src_network_addr>\S+) 

并且除其他之外在EVENT之内。

我不知道如何创建一个正则表达式,以便能够抓取整个EVENT以及EVENT中的位。

回答

2

捕获组另一捕获组内,

EVENT:\s\"\[INIT\](?P<log>.*?Source\sNetwork\sAddress:\s(?P<src_network_addr>\S+).*?Source\sPort:\s(?P<src_port>\d+).*?)\[END\]\" 

DEMO

上述正则表达式将捕获log,以及这是存在的log内的src_portsrc_network_addr

+0

,只有当要素是有序的和非可选工作。 – 2014-08-27 17:45:51

+0

根据输入张贴.. – 2014-08-27 17:46:34

1

下面列出的正则表达式将匹配开始EVENT: "[INIT]和结束[END]";的任何事件日志。如果任何感兴趣的短语都在事件日志中,它们将被记录下来。

请注意使用嵌套捕获组:(?P<log>...(?P<src_port>...)...)。外部团队将捕捉整个模式,包括内部组织捕获的任何内容。

另请注意,任何不参与比赛的组仍然存在于结果dict中,其值为None

import re 
from pprint import pprint 


texts=[ 
    'EVENT: "[INIT]WinEvtLog: Security: AUDIT_SUCCESS(528): Security: Administrator: AMAZON-D071A6F8: AMAZON-D071A6F8: Successful Logon: User Name: Administrator Domain: AMAZON-D071A6F8 Logon ID: (0x0,0x1054A66) Logon Type: 10 Logon Process: User32 Authentication Package: Negotiate Workstation Name: AMAZON-D071A6F8 Logon GUID: - Caller User Name: AMAZON-D071A6F8$ Caller Domain: WORKGROUP Caller Logon ID: (0x0,0x3E7) Caller Process ID: 968 Transited Services: - Source Network Address: 10.0.0.200 Source Port: 60054 [END]";', 
    'EVENT: "[INIT]Random text with one match Source Port: 60054 And stuff at end [END]";', 
    'EVENT: "[INIT]Random text with no matches [END]";'] 


for text in texts: 
    match = re.match(
    r''' 
     (?x)         # Verbose 
     EVENT:\s"\[INIT]      # anchor from beginning 
     (?P<log>        # record entire entry 
     (?:        # consisting of: 
      (?:Source\sNetwork\sAddress:\s # src_network_address 
      (?P<src_network_address>\S+)) 
      |        # OR 
      (?:Source\sPort:\s    # src_port 
      (?P<src_port>\S+)) 
      |        # OR 
      .*?        # anything else 
     )*         # as many times as required 
    ) 
     \s\[END]";$       # anchor at end 
    ''', 
    text) 
    if(match): 
    pprint (match.groupdict()) 

结果:

{'log': 'WinEvtLog: Security: AUDIT_SUCCESS(528): Security: Administrator: AMAZON-D071A6F8: AMAZON-D071A6F8: Successful Logon: User Name: Administrator Domain: AMAZON-D071A6F8 Logon ID: (0x0,0x1054A66) Logon Type: 10 Logon Process: User32 Authentication Package: Negotiate Workstation Name: AMAZON-D071A6F8 Logon GUID: - Caller User Name: AMAZON-D071A6F8$ Caller Domain: WORKGROUP Caller Logon ID: (0x0,0x3E7) Caller Process ID: 968 Transited Services: - Source Network Address: 10.0.0.200 Source Port: 60054', 
'src_network_address': '10.0.0.200', 
'src_port': '60054'} 
{'log': 'Random text with one match Source Port: 60054 And stuff at end', 
'src_network_address': None, 
'src_port': '60054'} 
{'log': 'Random text with no matches', 
'src_network_address': None, 
'src_port': None}