2015-07-21 170 views
1

列块随着文件结构:读入从CSV文件PYTHON

A B C D 
1 x y z 
2 x y z 
3 x y z 
4 x y z 
5 i j k 
6 i j k 
7 .......etc. 

我要跳过的标题,则每行的第一个元素。

真正的多汁数据是x,y,z,i,j,k值。

这些值是ADC值,需要排列成列表。

my_list = [0] [x,x,x,x] 
      [1] [y,y,y,y] 
      [2] [z,z,z,z] 
      [3] [i,i,i,i] etc. 

我可以很容易地迭代出整列,但棘手的部分是遍历每列的某些行。

我试过到目前为止:

def readin(myfile): 

import csv 
with open(myfile, 'r') as f: # Open Results File 

    next(f) # skip headings 

    data = csv.reader(f, delimiter="\t") 
    temp = [] 
    temp2=[] 
    my_list=[] 

    for i in range(13): #my_list will be 12 lists long 
     print i 
     for x in range(1,4): 
     for row in data: 
     temp.append(row[x]) 
    return my_list 

我只是得到一列迭代出来。我不知道如何轻松切列(单独X的,我的等

+0

什么是您预期的输出? –

+0

@omri_saadon“my_list”(修改后) – cc6g11

+0

@omri_saadon ...忽略文件中的1-7等,因此每行中的元素[1:3] – cc6g11

回答

2

转置数据和切片:

from itertools import izip 
data = csv.reader(f, delimiter="\t") 
trans = izip(*data) 
A = next(trans) # skip first col 
+0

这很好,但如何忽略转置数据中的第一行? – cc6g11

+0

@ cc6g11,使用itertools.zip,在izip对象上调用next来跳过第一个列 –

1

这是代码,你可以看到我用熊猫来操纵我的数据

import pandas as pd 

df = pd.read_csv("te.txt") 
df.drop(df.columns[[0]], axis=1, inplace=True) # delete the first column as you wished 
li = [] 
for col in df.columns: 
    li.append(list(df[col])) 
print li 

输出:

[['x', 'x', 'x', 'x', 'i', 'i'], 
['y', 'y', 'y', 'y', 'j', 'j'], 
['z', 'z', 'z', 'z', 'k', 'k']] 

这是csv文件 “te.txt”:

A,B,C,D 
1,x,y,z 
2,x,y,z 
3,x,y,z 
4,x,y,z 
5,i,j,k 
6,i,j,k 
+0

快速问题,如何用del函数删除?我不明白你通过它的论点。 – cc6g11

+0

@ cc6g11,我改变了将列删除到更多'熊猫'的方式。 –

0

无需外接模块,但csv的一种方法:

import csv 

with open('blocks.csv') as infile: 
    reader = csv.reader(infile) 
    out_list = [] 

    # skip first line 
    next(reader) 

    while True: 
     block = [] 
     try: 
      # read four lines 
      for i in range(4): 
       block.append(next(reader)) 
     except StopIteration: 
      break 

     # transpose the block and skip the index column 
     transposed_block = zip(*block)[1:] 
     out_list += transposed_block 

这将产生以下out_list

>>> out_list 
[('x', 'x', 'x', 'x'), 
('y', 'y', 'y', 'y'), 
('z', 'z', 'z', 'z'), 
('i', 'i', 'i', 'i'), 
('j', 'j', 'j', 'j'), 
('k', 'k', 'k', 'k')] 
0

使用熊猫作为初级讲座:

from pandas import DataFrame as df 

d = df.read_csv("text.txt") 

d.drop(d.columns[[0]], axis=1, inplace=True) 
k_list = [d.loc[:3,k].tolist() for k in d.columns()] 

print k_list 

输出:

[['x', 'x', 'x', 'x'], 
['y', 'y', 'y', 'y'], 
['z', 'z', 'z', 'z']] 
0

下面会给你你问的结果。它使用一次读取四行轻微的替代方法,并且还删除第一列:

import csv 

def readin(myfile): 
    my_list = [] 

    with open(myfile, 'r') as f:  # Open Results File 
     csv_input = csv.reader(f, delimiter=" ", skipinitialspace=True) 
     headings = next(csv_input)  # Skip headings 

     try: 
      while True: 
       my_list.extend(zip(next(csv_input), next(csv_input), next(csv_input), next(csv_input))[1:]) 
     except StopIteration: 
      pass 

    return my_list 

result = readin("results_file.csv") 

print result[0] 
print result 

输出是:

('x', 'x', 'x', 'x') 

[('x', 'x', 'x', 'x'), ('y', 'y', 'y', 'y'), ('z', 'z', 'z', 'z'), ('i', 'i', 'i', 'i'), ('j', 'j', 'j', 'j'), ('k', 'k', 'k', 'k')]