2016-04-02 72 views
1

我有超过200个文件,我想通过列列表CLNAME值来划分,并保持头在所有files.I也想保存这与文件OriginalFileName-clName.txt鸿沟csv文件

ID Plate Well  ctr  clID  clName 
21 5  C03  1  50012  COL 
21 5  C03  1  50012  COL 
21 5  C03  1  50012  COL 
21 5  C04  1  50012  IA 
21 5  C04  1  50012  IA 
21 5  C05  1  50012  ABC 


import csv 
from itertools import groupby 

for key, rows in groupby(csv.reader(open("file.csv")), 
         lambda row: row[7]): 
    with open("%s.txt" % key, "w") as output: 
     for row in rows: 
      output.write(",".join(row) + "\n") 

我遇到的问题是列不会总是被称为clName,它可以被称为clName,cll_n,c_Name。有时这将是第7列,其他时间列5或9.

我所知道的是按列值分隔文件,但不保留标题,我必须检查每个文件以查找其列5 ,7,9等。

有没有办法让我检查名称列表中的列名称,以及何时发现其中一个名称按该列值拆分文件?

例如数据 https://drive.google.com/file/d/0Bzv1SNKM1p4uell3UVlQb0U3ZGM/view?usp=sharing

谢谢

+0

你的意思是你想添加最后一个列标题到文件末尾吗?如何确定第5,7或9栏中是否有正确的名称? –

+0

不需要在每个文件中保留标题。然后保存带有列值和原始文件名的文件,如originalfile-COL.txt –

回答

2

使用csv.DictReadercsv.DictWriter代替。这是一个应该指向正确方向的轮廓。

special_col = ['cll_n', 'clName'] 

with open('myfile.csv', 'r') as fh: 
    rdr = csv.DictReader(fh) 

    # now we need to figure out which column is used 
    for c in special_col: 
     if c in rdr.fieldnames: 
      break # found the column name 
    else: 
     raise IOError('No special column in file') 

    # now execute your existing code, but group by the 
    # column using lambda row: row[c] instead of row 7 
    call_existing_code(rdr, c) 


def call_existing_code(rdr, c): 
    # set up an output file using csv.DictWriter; you can 
    # replace the original column with the new column, and 
    # control the order of fields 

    with open('output.csv', 'w') as fh: 
     wtr = csv.DictWriter(fh, fieldnames=['list', 'of', 'fields']) 
     wtr.writeheader() 

     for row in groupby(rdr, lambda r: r[c]): 

      # [process the row as needed here] 

      wtr.writerow(row)