2016-11-29 93 views
1

我有这个csv文件:如何分割csv文件的条件?

89,Network activity,ip-dst,80.179.42.44,,1,20160929 
89,Payload delivery,md5,4ad2924ced722ab65ff978f83a40448e,,1,20160929 
89,Network activity,domain,alkamaihd.net,,1,20160929 
90,Payload delivery,md5,197c018922237828683783654d3c632a,,1,20160929 
90,Network activity,domain,dnsrecordsolver.tk,,1,20160929 
90,Network activity,ip-dst,178.33.94.47,,1,20160929 
90,Payload delivery,filename,Airline.xls,,1,20160929 
91,Payload delivery,md5,23a9bbf8d64ae893db17777bedccdc05,,1,20160929 
91,Payload delivery,md5,07e47f06c5ed05a062e674f8d11b01d8,,1,20160929 
91,Payload delivery,md5,bd75af219f417413a4e0fae8cd89febd,,1,20160929 
91,Payload delivery,md5,9f4023f2aefc8c4c261bfdd4bd911952,,1,20160929 
91,Network activity,domain,mailsinfo.net,,1,20160929 
91,Payload delivery,md5,1e4653631feebf507faeb9406664792f,,1,20160929 
92,Payload delivery,md5,6fa869f17b703a1282b8f386d0d87bd4,,1,20160929 
92,Payload delivery,md5,24befa319fd96dea587f82eb945f5d2a,,1,20160929 

我需要这个CSV文件分割到4个CSV文件,其中的条件是在每一行开头的事件编号。到目前为止,我创建了一个包含事件编号{89,90,91,92}的集合,并且我知道我需要在循环中进行循环,并将每一行复制到其专用的csv文件中。

+0

看一看这个类似的问题:http://stackoverflow.com/questions/40789383/python-split-csv-file-according-第一列字符/ 40790237#40790237 – chthonicdaemon

回答

0

这将是最好不要硬编码的事件号码你的代码,所以它不依赖于数据的值。我还倾向于使用经过优化的csv模块来读取和写入.csv文件。

这里有一个办法做到这一点:

import csv 

prefix = 'events' # of output csv file names 
data = {} 

with open('conditions.csv', 'rb') as conditions: 
    reader = csv.reader(conditions) 
    for row in reader: 
     data.setdefault(row[0], []).append(row) 

for event in sorted(data): 
    csv_filename = '{}_{}.csv'.format(prefix, event) 
    print(csv_filename) 
    with open(csv_filename, 'wb') as csvfile: 
     writer = csv.writer(csvfile) 
     writer.writerows(data[event]) 

更新

上述第一实现的方法读取整个CSV文件到内存中,然后写入所有与每个事件值相关联的行成一个单独的输出文件,一次一个。

更具有内存效率的方法是同时打开多个输出文件,并在每个行被读出到适当的目标文件后立即写入每一行。这样做需要跟踪哪些文件已经打开。文件管理代码需要做的其他事情是确保在处理完成时关闭所有文件。

在下面的代码中,所有这些都是通过定义和使用Python Context Manager类型来集中处理可能生成的所有csv输出文件,具体取决于输入文件中有多少个不同的事件值。

import csv 
import sys 
PY3 = sys.version_info.major > 2 

class MultiCSVOutputFileManager(object): 
    """Context manager to open and close multiple csv files and csv writers. 
    """ 
    def __enter__(self): 
     self.files = {} 
     return self 

    def __exit__(self, exc_type, exc_value, traceback): 
     for file, csv_writer in self.files.values(): 
      print('closing file: {}'.format(file.name)) 
      file.close() 
     self.files.clear() 
     return None 

    def get_csv_writer(self, filename): 
     if filename not in self.files: # new file? 
      open_kwargs = dict(mode='w', newline='') if PY3 else dict(mode='wb') 
      print('opening file: {}'.format(filename)) 
      file = open(filename, **open_kwargs) 
      self.files[filename] = file, csv.writer(file) 

     return self.files[filename][1] # return associated csv.writer object 

这里是如何使用它:

prefix = 'events' # to name of each csv output file 

with open('conditions.csv', 'rb') as conditions: 
    reader = csv.reader(conditions) 
    with MultiCSVOutputFileManager() as file_manager: 
     for row in reader: 
      csv_filename = '{}_{}.csv'.format(prefix, row[0]) # row[0] is event 
      writer = file_manager.get_csv_writer(csv_filename) 
      writer.writerow(row) 
+0

很好,谢谢你哈哈! – shamirs888

2
data = { 
     '89': [], 
     '90': [], 
     '91': [], 
     '92': [] 
    } 

with open('yourfile.csv') as infile: 
    for line in infile: 
     prefix = line[:2] 
     data[prefix].append(line) 

for prefix in data.keys(): 
    with open('csv' + prefix + '.csv', 'w') as csv: 
     csv.writelines(''.join(data[prefix])) 

但是,如果你是开放的,然后这可以通过运行四个命令

grep ^89 file.csv > 89.csv 
grep ^90 file.csv > 90.csv 

同样,对于其它的值很容易地完成Python以外的解决方案。

+0

我知道了,但是我收到一个错误:“ 文件”C:/Users/oshamir/untitled2.py“,第34行,在 数据[前缀] .append(行) KeyError:'uu'' – shamirs888

0

你甚至可以动态创建生成的文件,如果第一场尚未通过保持该ID的映射和相关文件中遇到:

files = {} 
with open('file.csv') as fd: 
    for line in fd: 
     if 0 == len(line.strip()): continue # skip empty lines 
     try: 
      id_field = line.split(',', 1)[0] # extract first field 
      if not id in files.keys():  # if not encountered open a new result file 
       files[id] = open(id + '.csv') 
      files[id].write(line)   # write the line in proper file 
     except Exception as e: 
      print('ERR', line, e)   # catchall in case of problems...