2012-03-11 34 views
1

嘿,我试图找到许多攻击每天ip每天登录。我正在从系统日志文件中读取数据。python查找每天ip每天的攻击次数

此行两行是从

报价阅读...

Jan 10 09:32:09 j4-be03 sshd[3876]: Failed password for root from 218.241.173.35 port 47084 ssh2 
Jan 10 09:32:19 j4-be03 sshd[3879]: Failed password for root from 218.241.173.35 port 47901 ssh2 
Feb 7 17:19:16 j4-be03 sshd[10736]: Failed password for root from 89.249.209.92 port 46139 ssh2 

这是我的代码:

desc_date = {}  
count_date = 0 
desc_ip = {} 
count_ip = 0 

for line in myfile: 
    if 'Failed password for' in line:  
     line_of_list = line.split()  
     #working together 
     date_port = ' '.join(line_of_list[0:2]) 
     date_list = date_port.split(':') 
     date = date_list[0] 
     if desc_date.has_key(date): 
      count_date = desc_date[date] 
      count_date = count_date +1 
      desc_date[date] = count_date 
      #zero out the temporary counter as a precaution 
      count_date =0 
     else: 
      desc_date[date] = 1 

     ip_port = line_of_list[-4] 
     ip_list = ip_port.split(':') 
     ip_address = ip_list[0] 
     if desc_ip.has_key(ip_address): 
      count_ip = desc_ip[ip_address] 
      count_ip = count_ip +1 
      desc_ip[ip_address] = count_ip 
      #zero out the temporary counter as a precaution 
      count_ip =0 
     else: 
      desc_ip[ip_address] = 1 

     resulting = dict(desc_date.items() + desc_ip.items()) 
     for result in resulting: 
      print result,' has', resulting[result] , ' attacks' 
目前

给我这些结果是错误的:

报价...

Feb 8 has 33 attacks 
218.241.173.35 has 15 attacks 
72.153.93.203 has 14 attacks 
213.251.192.26 has 13 attacks 
66.30.90.148 has 14 attacks 
Feb 7 has 15 attacks 
92.152.92.123 has 5 attacks 
Jan 10 has 28 attacks 
89.249.209.92 has 15 attacks 

它的IP地址是错误的,不知道从哪里代码脚麻希望有人能帮助

+0

你为什么认为IP地址错了? – 2012-03-11 23:13:56

+1

如果你编辑你的文章以确保代码被正确缩进,它会帮助我们。 – BobS 2012-03-11 23:18:09

+0

因为例如JAN 10 - 有28次攻击,所以我需要每个IP地址每天匹配28次攻击 – 2012-03-11 23:18:55

回答

0

残月:未经测试的代码。

attacks = {} 

# count the attacks 
for line in file: 
    if 'Failed password for' in line: 
     date = re.match(line, '^(\w{3}\b\d{1,2})\b').group(1) 
     attacks_date = attacks.get(date, {}) 
     ip = re.match(line, '\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b').group(1) 
     attacks_date[ip] = 1 + attacks_date.get(ip, 0) 
     attacks[date] = attacks_date 

# output results 
for item in attacks.items(): 
    date, attacks_date = item 
    print date, 'has', attacks_date.values().sum(), 'attacks' 
    for attack_item in attacks_date.items(): 
     ip, n = attack_item 
     print ip, 'has', n, 'attacks' 
4

尝试这种解决方案,我的问题与样品输入测试它和正常工作:

import re 
from collections import defaultdict 
pattern = re.compile(r'(\w{3}\s+\d{1,2}).+Failed password for .+? from (\S+)') 

def attack_dict(myfile): 
    attacks = defaultdict(lambda: defaultdict(int)) 
    for line in myfile: 
     found = pattern.match(line) 
     if found: 
      date, ip = found.groups() 
      attacks[date][ip] += 1 
    return attacks 

def report(myfile): 
    for date, ips in attack_dict(myfile).iteritems(): 
     print '{0} has {1} attacks'.format(date, sum(ips.itervalues())) 
     for ip, n in ips.iteritems(): 
      print '\t{0} has {1} attacks'.format(ip, n) 

运行这样的:

report(myfile) # myfile is the opened file with the log 
+2

你可以在这种情况下使用'pattern.match'。 '日期,ip = found.groups()'可能更易读 – jfs 2012-03-12 04:47:47

+0

@ J.F。塞巴斯蒂安感谢您的建议,我相应地编辑了我的答案 – 2012-03-12 10:45:19

2

我看到两个问题。 1)你正在计算白天攻击,IP攻击和端口攻击,都是分开的;来自给定IP的攻击和攻击日期之间没有关联。 2)通过在字典中的项目进行迭代,因为你在

resulting = dict(desc_date.items() + desc_ip.items()) 
for result in resulting: 
    print result,' has', resulting[result] , ' attacks' 

已经做会给攻击累积数量在本质上随机的顺序,自由地混合攻击按IP的攻击,按日期。你看到

Feb 8 has 33 attacks 

事实......接着

218.241.173.35 has 15 attacks 
72.153.93.203 has 14 attacks 
213.251.192.26 has 13 attacks 
66.30.90.148 has 14 attacks 

...并不意味着通过IP这些袭击发生在8月

的15次攻击来自218.241。 173.35表示日志文件覆盖的整个时间段内该IP的攻击总数。 2月8日之后发生的218.241.173.35线是偶然的,而不是在其他日期之前或之后。

+0

我如何实现这一点我知道你的意思是什么但不确定如何实现它 – 2012-03-12 08:51:22

+0

我将如何继续取决于我在你的主要帖子的评论中询问的问题的答案(关于排序顺序)。对不起,我在不同的地方问过;这似乎是一个普遍相关的问题,但也许你错过了,因为这一点。 – BobS 2012-03-14 03:18:26