2015-11-11 61 views
0

我对Python很新颖(一般编程很好),并且可以真正使用你的帮助。如果不存在,则追加。如果存在,增加计数

我正在尝试通读防火墙日志文件。我对其中有Deny的所有行感兴趣。如果发现它应该提取源IP,目标IP,目标端口和协议。但我不想看到所有的线条,只有独特的线条。到现在为止还挺好。一切正常(尽管我确信它可以做得更聪明),但我也想添加一个计数器,以便我可以看到s_ip,d_ip,d_port和协议的特定组合发生了多少次,但是我不知道如何。日志文件的

例子:

Nov 9 00:36:10 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/43882 dst outside:2.2.2.2/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:10 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/38780 dst outside:2.2.2.2/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:11 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/8273 dst outside:2.2.2.2/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/23433 dst outside:2.2.2.22/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/25175 dst outside:2.2.2.24/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/15855 dst outside:2.2.2.26/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/24574 dst outside:2.2.2.27/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/21797 dst outside:2.2.2.29/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny udp src outside:3.3.3.3/12112 dst outside:2.2.2.99/53031 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:13 firewall %ASA-4-106023: Deny icmp src outside:4.4.4.4 dst services:2.2.2.211 (type 11, code 1) by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:17 firewall %ASA-4-106023: Deny icmp src outside:4.4.4.4 dst services:2.2.2.10 (type 3, code 3) by access-group "outside-in" [0x0, 0x0] 

我能得到以下结果

'icmp' 
'tcp', '1.1.1.1', '2.2.2.2', '23' 
'tcp', '1.1.1.1', '2.2.2.22', '23' 
'tcp', '1.1.1.1', '2.2.2.24', '23' 
'tcp', '1.1.1.1', '2.2.2.26', '23' 
'tcp', '1.1.1.1', '2.2.2.27', '23' 
'tcp', '1.1.1.1', '2.2.2.29', '23' 
'udp', '3.3.3.3', '2.2.2.99', '53031' 

我还没有完全成功地获得ICMP输出(ICMP是无/端口和我正则表达式正在使用它来获取IP地址),并且我会尽量使输出更好一点(试着去除'和),但是我真正想要的是每行都有一个hitcount,例如第一个tcp行的计数为3,依此类推。

import re  #for regular expressions - to match ip's 
import sys  #for parsing command line opts 

# if file is specified on command line, parse, else ask for file 
if sys.argv[1:]: 
    print "File: %s" % (sys.argv[1]) 
    logfile = sys.argv[1] 
else: 
    logfile = raw_input("Please enter a file to parse, e.g /var/log/secure: ") 

match = [] 
seen = [] 

# find all Deny lines and append them in a list 
for lines in open(logfile) : 
    extract = re.findall('Deny.*"' ,lines) 
    for i in extract : 
     match.append(i) 

# extract different keywords from Deny lines 
for lines in match : 
    prot = re.findall('Deny\s(.+?)\ssrc',lines) 
    ip_src = re.findall('src.*?:([0-9a-f].*?)/', lines) 
    ip_dst = re.findall('dst.*?:([0-9a-f].*?)/', lines) 
    #ip_sport = re.findall('src.*?[0-9a-f].*?/([0-9].*?)\s', lines)  # uncomment if you want source port also, and add ip_sport to summarized below 
    ip_dport = re.findall('dst.*?[0-9a-f].*?/([0-9].*?)\s', lines) 

    summarized = prot + ip_src + ip_dst + ip_dport 

    if summarized not in seen :    # only add unique entries 
     seen.append(summarized) 


# sort 
seen.sort() 

for lines in seen : 
    print (", ".join(repr(e) for e in lines)) 

更进一步,我是想扔它3GB的日志文件,它现在已经运行几个小时。任何优化代码的好主意?

我意识到我在问很多问题,并且我非常感谢他们提供的帮助,但我的主要问题是帮助获得指标。

+2

SO不是代码审查/教学服务。你应该问具体的编程问题。请限制自己每个帖子询问一个问题。 – memoselyk

+0

另一方面,[codereview.se] _is_代码审查/教学服务。你不需要有一个特定的编程问题 - 只需要你需要建议的一些工作代码。 –

+0

正式注意:o)。谢谢您的回答。 – joni

回答

2

Python标准库已经有一个Counter class

你可以改变seen变量是一个Counter

from collections import Counter 

[...] 

seen = Counter() 

# extract different keywords from Deny lines 
for lines in match : 

    [...] 

    summarized = prot + ip_src + ip_dst + ip_dport 

    # NOTE: summarized must be a string or tuple. 
    seen.update([summarized]) 

在年底,seen字典将各有独特的概括行按键和每行的数量将是价值。

关于优化,如果您在处理每行时遇到它,那么在for lines in open(logfile)循环中,会更好(我认为)。

+0

非常感谢 - 我已经实施了关于柜台的建议,他们的工作就像一个魅力:o) – joni

+0

如果这对你有效,请将此标记为您接受的答案以表示您的赞赏。如果您需要更多帮助,请发布单独的问题。我已经将你的问题从Rev 2回滚到Rev 1。 –

0

为避免重复输入,您可以使用set而不是list。我会做:

seen = set() 
for lines in open(logfile) : 
    extract = re.findall('Deny.*"' ,lines) 
    for i in extract : 
     prot = re.findall('Deny\s(.+?)\ssrc',i) 
     ip_src = re.findall('src.*?:([0-9a-f].*?)/', i) 
     ip_dst = re.findall('dst.*?:([0-9a-f].*?)/', i) 
     #ip_sport = re.findall('src.*?[0-9a-f].*?/([0-9].*?)\s', i) 
     ip_dport = re.findall('dst.*?[0-9a-f].*?/([0-9].*?)\s', i) 
     seen.add((prot, ip_src, ip_dst, ip_dport)) #Add here ip_sport if you want 

这应该是更快它使用较少的循环,而另一方面set s为无序的(这里的是,虽然盖了,http://code.activestate.com/recipes/576694/配方)。如果你不想构建它并且命令你应该在打印之前将它转换为列表

相关问题