2015-09-18 50 views
0

我用下面testcodeNumpy Recarray将字节文字标签写入我的csv文件?

import numpy as np 
import csv 

data = np.zeros((3,),dtype=("S24,int,float")) 
with open("testtest.csv", 'w', newline='') as f: 
    writer = csv.writer(f,delimiter=',') 
    for row in data: 
     writer.writerow(row) 

并且数据在CSV文件具有B“”标记(字节文字标记),用于记录阵列的串组件。 处理写入这些记录数组的csv的正确方法以及避免在csv文件中包含字节字面量标记的最佳方法是什么?

+0

这看起来像[开放问题#4543](https://github.com/numpy/ numpy/issues/4543) – askewchan

回答

0

我觉得你与Python3其中使用Unicode作为默认字符串类型的工作。字节串然后得到特殊的b标记。

如果我生成使用Unicode而不是字节的数据,这个工程:

In [654]: data1 = np.zeros((3,),dtype=("U24,int,float")) 
In [655]: data1['f0']='xxx' # more interesting string field 
In [656]: with open('test.csv','w') as f: 
    writer=csv.writer(f,delimiter=',') 
    for row in data1: 
     writer.writerow(row) 
In [658]: cat test.csv 
xxx,0,0.0 
xxx,0,0.0 
xxx,0,0.0 

np.savetxt做同样的事情:

In [668]: np.savetxt('test.csv',data1,fmt='%s',delimiter=',') 
In [669]: cat test.csv 
xxx,0,0.0 
xxx,0,0.0 
xxx,0,0.0 

的问题是,我可以解决此,同时保持S24字段?例如打开文件为wb

https://stackoverflow.com/a/27513196/901925 Trying to strip b' ' from my Numpy array

探讨过这个问题,前面看起来像我的解决方案是要么decode字节字段,或者直接写一个字节的文件。由于您的数组混合了字符串和数字字段,因此decode解决方案更乏味。

data1 = data.astype('U24,i,f') # convert bytestring field to unicode 

一个辅助功能,可用于decode字节串上飞:

In [147]: fn = lambda row: [j.decode() if isinstance(j,bytes) else j for j in row] 
In [148]: with open('test.csv','w') as f: 
    writer=csv.writer(f,delimiter=',') 
    for row in data: 
     writer.writerow(fn(row)) 
    .....:   
In [149]: cat test.csv 
xxx,0,0.0 
yyy,0,0.0 
zzz,0,0.0 
+0

另一个numpy字节字符串格式化问题:http://stackoverflow.com/questions/32207420/numpy-string-encoding/32208336。除了自定义的“格式”方法,它仍然推荐“解码”。 – hpaulj

0

您是否需要这三种dtype中的数据?考虑在numpy浮点数或整数数组上使用numpy.savetxt()。

http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html

data = np.zeros((3,3)) 
filename='foo' 
np.savetxt(filename+".csv",data,fmt='%1.6e',delimiter=",") 
#fmt='%1.6e' controls how the numbers are written to the text file. 
#E.g. use fmt='%d' for integers