2014-09-22 51 views
-2

嗨,大家好,我有问题如何在字典中总结相同的IP地址。 我输入文件,该文件是这样的:Python字典sum

IP   , Byte 
10.180.176.61,3669 
10.164.134.193,882 
10.164.132.209,4168 
10.120.81.141,4297 
10.180.176.61,100 

我此举是为了打开该文件,并用逗号后的数字解析IP地址,以便我可以总结的所有字节的一个IP地址。这样我就可以像结果:

IP 10.180.176.61 , 37669 

我的代码如下所示:

#!/usr/bin/python 
# -*- coding: utf-8 -*- 

import re,sys, os 
from collections import defaultdict 

f  = open('splited/small_file_1000000.csv','r') 
o  = open('gotovo1.csv','w') 

list_of_dictionaries = {} 

for line in f: 
    if re.search(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.*',line): 
     line_ip = re.findall(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}',line)[0] 
     line_by = re.findall(r'\,\d+',line)[0] 
     line_b = re.sub(r'\,','',line_by) 

     list_of_dictionaries['IP'] = line_ip 
     list_of_dictionaries['VAL'] = int(line_b) 


c = defaultdict(int) 
for d in list_of_dictionaries: 
    c[d['IP']] += d['VAL'] 

print c 

任何想法将是巨大的。

回答

1

使用csv模块读取文件并collections.Counter总结每个IP地址的总数:

from collections import Counter 
import csv 


def read_csv(fn): 
    with open(fn, 'r') as csvfile: 
     reader = csv.reader(csvfile, delimiter=',') 
     reader.next() # Skip header 
     for row in reader: 
      ip, bytes = row 
      yield ip, int(bytes) 


totals = Counter() 
for ip, bytes in read_csv('data.txt'): 
    totals[ip] += bytes 

print totals 

输出:

Counter({'10.120.81.141': 4297, '10.164.132.209': 4168, '10.180.176.61': 3769, '10.164.134.193': 882}) 
0

如果你的文件看起来像这个例子中你提供你不不需要正则表达式来解析它。

list_of_dictionaries = {} 
with open('splited/small_file_1000000.csv', 'r') as f: 
    header = f.readline() 
    for line in f: 
      ip, bytes = line.split(',') 
      if list_of_dictionaries.has_key(ip): 
       list_of_dictionaries[ip] += int(bytes.strip()) 
      else: 
       list_of_dictionaries[ip] = int(bytes.strip()) 
OUT: {'10.180.176.61': 3769, '10.164.134.193': 882, '10.164.132.209': 4168, '10.120.81.141': 4297} 
:只要使用逗号分割线