2014-06-18 16 views
0

下面是myfile.csvCSV读者从代表名单,其元素列/数组不是一个值

1st  2nd  3rd  4th      5th 
2061100 10638650 -8000  25   [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 
2061800 10639100 -8100  26   [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
2061150 10638750 -8250  25   [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
2061650 10639150 -8200  25   [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0] 
2061350 10638800 -8250  3   [5.0, 5.0, 5.0] 
2060950 10638700 -8000  1   [1.0] 
2061700 10639100 -8100  11   [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
2061050 10638800 -8250  6   [3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
2061500 10639150 -8200  1   [4.0] 
2061250 10638850 -8150  16   [5.0, 5.0, 5.0, 5.0] 

我当前的代码内容:

from numpy import genfromtxt 
mydata = genfromtxt('myfile.csv', delimiter=',') 
arr = np.array(mydata) 
col5 = arr[:,4] 

不过,我想读第5列作为列表,然后读取列表中的所有元素以进一步计算。我该怎么办?

+0

检查了这一点:http://stackoverflow.com/questions/20685567/ convert-python-string-to-list – Korem

+1

您显示的'myfile.csv'与您正在阅读的格式('delimiter =','')不匹配。如果实际文件在第1-4列以逗号分隔,那么使用单独的'numpy.genfromtxt'功能确定第5列的实际边界时会出现问题。 –

回答

1

我想我会被诱惑,只是手动做到这一点:

with open(fn) as f: 
    header=next(f).strip() 
    print(header) 
    for row in f: 
     row=row.rstrip() 
     lp,_,rp=row.partition('[') 
     rp=rp.strip(']') 
     lp_data=list(map(int, lp.split())) 
     rp_data=list(map(float, rp.split(','))) 
     print(lp_data+[rp_data]) 

打印:

1st  2nd  3rd  4th      5th 
[2061100, 10638650, -8000, 25, [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]] 
[2061800, 10639100, -8100, 26, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]] 
[2061150, 10638750, -8250, 25, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]] 
[2061650, 10639150, -8200, 25, [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]] 
[2061350, 10638800, -8250, 3, [5.0, 5.0, 5.0]] 
[2060950, 10638700, -8000, 1, [1.0]] 
[2061700, 10639100, -8100, 11, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]] 
[2061050, 10638800, -8250, 6, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]] 
[2061500, 10639150, -8200, 1, [4.0]] 
[2061250, 10638850, -8150, 16, [5.0, 5.0, 5.0, 5.0]] 
1

Pandas可以读取固定宽度的文件(相对于标签/逗号分隔的文件)像你:

import pandas as pd 
import ast 

df = pd.read_fwf('test.txt', colspecs=[(41,100)])['5th']\ 
     .apply(lambda x: ast.literal_eval(x)) 

你得到:

>>> df 

0   [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 
1   [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
2   [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
3   [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0] 
4        [5.0, 5.0, 5.0] 
5          [1.0] 
6 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0] 
7    [3.0, 3.0, 3.0, 3.0, 3.0, 3.0] 
8          [4.0] 
9      [5.0, 5.0, 5.0, 5.0] 
Name: 5th, dtype: object 
2

如果列之间的空白都是标签:

import csv, ast, pprint 
result = list() 
with open('in.txt') as in_file: 
    reader = csv.reader(in_file, delimiter = '\t') 
    for line in reader: 
     line[:4] = map(int, line[:4]) 
     line[4] = ast.literal_eval(line[4]) 
     result.append(line)  

pprint.pprint(result) 

>>> 
[[2061100, 10638650, -8000, 25, [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]], 
[2061800, 10639100, -8100, 26, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]], 
[2061150, 10638750, -8250, 25, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]], 
[2061650, 10639150, -8200, 25, [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]], 
[2061350, 10638800, -8250, 3, [5.0, 5.0, 5.0]], 
[2060950, 10638700, -8000, 1, [1.0]], 
[2061700, 10639100, -8100, 11, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]], 
[2061050, 10638800, -8250, 6, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]], 
[2061500, 10639150, -8200, 1, [4.0]], 
[2061250, 10638850, -8150, 16, [5.0, 5.0, 5.0, 5.0]]] 
>>> 

关于这一主题的变化:

with open('in.txt') as in_file: 
    reader = csv.reader(in_file, delimiter = '\t') 
    result = [[ast.literal_eval(item) for item in line] for line in reader] 
相关问题