2017-02-28 133 views
0

我的代码能够获取文本文件的28列并格式化/删除一些数据。我如何选择特定的列?我想要的列是0到25和列28.什么是最好的方法?从CSV文件中选择特定列

在此先感谢!

import csv 
import os 

my_file_name = os.path.abspath('NVG.txt') 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC'] 


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    cr = csv.reader(infile, delimiter='|') 
    writer.writerow(next(cr)[:28]) 
    for line in (r[0:28] for r in cr): 

     if not any(remove_word in element for element in line for remove_word in remove_words): 
     line[11]= line[11][:5] 

     writer.writerow(line) 
infile.close() 
outfile.close() 

回答

3

看看pandas

import pandas as pd 

usecols = list(range(26)) + [28] 
data = pd.read_csv(my_file_name, usecols=usecols) 

您还可以方便的使用数据写入filter()返回到一个新的文件

with open(cleaned_file, 'w') as f: 
    data.to_csv(f) 
+0

'Pandas'使得数据操作如此简单并可行。从我+1。 –

1

排除列26和column27从行:

for row in cr: 
    content = list(filter(lambda x: row.index(x) not in [25,26], row)) 
    # work with the selected columns content 
+0

如果你不得不调用列表,为什么不在这里使用列表理解:'content = [x for x in cr if cr.index(x)not in [25,26]]' – Ohjeah

+0

您可能是想过滤排,而不是读者。现在,您会在for循环的第一次迭代中耗尽读者。使用find也是浪费的,为什么不'enumerate()'? –

+0

@IljaEverilä是的,'排',修正了错字。谢谢! – haifzhan