2014-12-07 19 views
-1

我有一个简单的脚本来无论是从CSV文件中删除最后N列或保持第一n列只在CSV文件:删除或保留特定的列在csv文件

from sys import argv 
import csv 

if len(argv) == 4: 
    script, inputFile, outputFile, n = argv 
    n = [int(i) for i in n.split(",")] 
else: 
    script, inputFile, outputFile = argv 
    n = 1 

with open(inputFile,"r") as fin: 
    with open(outputFile,"w") as fout: 
    writer=csv.writer(fout) 
    for row in csv.reader(fin): 
     writer.writerow(row[:n]) 

用法示例(除去最后两列):removeKeepColumns.py sample.txt out.txt -2

如何扩展该处理的可能性,以保持/删除特定的一组列,如:

  • 删除列3,4,5
  • 只保留列,1,4,6

我可以拆分逗号separted到数组输入参数,但不知道烫到这个传递给writerow(row[])

链接到我用来创建脚本我的例子:

+0

http://stackoverflow.com/questions/724856/picking-out-items- python-list-which-specific-indexes#724881 – Jasper 2014-12-07 12:33:40

+0

@Jasper我不明白,你想介绍一下你的评论吗? – 2014-12-07 12:46:48

+0

如果我正确理解你,你试图从CSV中获得(非连续的)子序列。链接的问题告诉你如何做到这一点。 – Jasper 2014-12-07 12:52:32

回答

1

在阐述我的评论(Picking out items from a python list which have specific indexes

from sys import argv 
import csv 

if len(argv) == 4: 
    script, inputFile, outputFile, cols_str = argv 
    cols = [int(i) for i in cols_str.split(",")] 

with open(inputFile,"r") as fin: 
    with open(outputFile,"w") as fout: 
    writer=csv.writer(fout) 
    for row in csv.reader(fin): 
     sublist = [row[x] for x in cols] 
     writer.writerow(sublist) 

这应该(未经测试)保持该给定为第三个参数逗号分隔的列表中的所有列。删除给定的列,

sublist = [row[x] for x not in cols] 

应该做的伎俩。

3

那么有已经是一个公认的答案,这里是我的解决方案:

>>> import pyexcel as pe 
>>> sheet = pe.get_sheet(file_name="your_file.csv") 
>>> sheet.column.select([1,4,5]) # the column indices to keep 
>>> sheet.save_as("your_filtered_file.csv") 
>>> exit() 

这里有更多的细节上filtering