2013-12-18 125 views
0

使用下面的数据,使用代码片段,我收到以下错误。你能帮我解决这个问题吗?我是一名Python初学者。 数据:在Python中读取csv文件

"Id","Title","Body","Tags" 
"Id1","Tit,le1","Body1","Ta,gs1" 
"Id","Title","Body","Ta,2gs" 

代码:

#!/usr/bin/python 
import csv,sys 
if len(sys.argv) <> 3: 
print >>sys.stderr, 'Wrong number of arguments. This tool will print first n records from a comma separated CSV file.' 
print >>sys.stderr, 'Usage:' 
print >>sys.stderr, '  python', sys.argv[0], '<file> <number-of-lines>' 
sys.exit(1) 

fileName = sys.argv[1] 
n = int(sys.argv[2]) 

i = 0 
out = csv.writer(sys.stdout, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC) 

ret = [] 


def read_csv(file_path, has_header = True): 
    with open(file_path) as f: 
     if has_header: f.readline() 
     data = [] 
     for line in f: 
      line = line.strip().split("\",\"") 
      data.append([x for x in line]) 
    return data 


ret = read_csv(fileName) 
target = [] 
train = [] 
target = [x[2] for x in ret] 
train = [x[1] for x in ret] 

错误:

target = [x[2] for x in ret] 
IndexError: list index out of range 
+0

创建的文件没有超过两行吗? –

+0

对不起,数据有误。我现在编辑了这个问题。谢谢@PauloBu – novieq

+0

[x for line in]有什么意义? –

回答

3

你混合file.readline()和使用文件对象作为一个迭代。不要这样做。改为使用next()

你也应该使用csv.reader()模块来读取你的数据,不要重复这个轮子。该csv模块可处理援引CSV值与嵌入在thevalues分隔符在任何情况下要好得多:

import csv 

def read_csv(file_path, has_header=True): 
    with open(file_path, 'rb') as f: 
     reader = csv.reader(f) 
     if has_header: next(reader, None) 
     return list(reader) 

最后但并非最不重要的,你可以使用zip()转置行和列:

ret = read_csv(fileName) 
target, train = zip(*ret)[1:3] # just the 2nd and 3rd columns 

这里zip()将停在第一行,其中有不是足够的列,至少可以避免你看到的异常。

如果在某些行的缺少的列,(在Python 3 itertools.zip_longest())使用itertools.izip_longest()代替:

from itertools import izip_longest 

ret = read_csv(fileName) 
target, train = izip_longest(*ret)[1:3] # just the 2nd and 3rd columns 

默认与None替换缺少的列;如果你需要使用一个不同的值,传递一个fillvalue参数izip_longest()

target, train = izip_longest(*ret, fillvalue=0)[1:3] # just the 2nd and 3rd columns 
+0

感谢@Martijin。我收到以下错误: target,train = zip(* ret)[1:2] ValueError:需要多个值才能解包 – novieq

+0

@novieq:在这种情况下,您的CSV文件为空。没有可用的列,'zip()'返回一个空列表。 –

+0

我用'ret = read_csv(fileName) print(ret [0] [2]) print(ret [1] [2]) target,train = zip(* ret)[1:2]'and I可以看到输出。所以csv被正确解析。感谢@Martijn提前。 – novieq