2016-11-08 13 views
1

na_values表达我想读这样的文件中使用pandas.read_csv经常使用pandas.read_csv

1891, 91920, 7,  628,249, 59,51.0, 0.026, 0.028, NaN, NaN, NaN, NaN, NaN, 0.156, 0.071, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,43.8, 0.005, 0.619, NaN,45.6, 0.048, 0.053, NaN, NaN, NaN, NaN, NaN, -0.180, 0.088, 20, 0.012, 1.107, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,  NaN,  NaN,  NaN 
1891, 91920, 16,  628,135, 22,41.2, 0.093, 0.087, NaN, NaN, NaN, NaN, NaN, 0.416, 0.212, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,23.3, 0.021, 2.023, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,  NaN,  NaN,  NaN 
1891, 91920, 3,  628, 28, 39,47.0, 0.041, 0.044, NaN, NaN, NaN, NaN, NaN, -0.006, 0.064, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 21,37.5, 0.009, 0.964, NaN,45.3, 0.054, 0.055, NaN, NaN, NaN, NaN, NaN, -0.838, 0.228, 20, 0.013, 1.193, NaN,51.8, 0.025, 0.026, NaN, NaN, NaN, NaN, NaN, -0.021, 0.054, 21, 0.005, 0.540, NaN,  NaN,  NaN,  NaN 
1891, 91920, 6,  628,276, 20,40.0, 0.118, 0.101, NaN, NaN, NaN, NaN, NaN, -0.767, 0.558, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 21,500, 20,26.7, 0.032, 2.982, NaN,41.0, 0.088, 0.089, NaN, NaN, NaN, NaN, NaN, -0.141, 0.233, 20, 0.024, 2.074, NaN,46.2, 0.053, 0.049, NaN, NaN, NaN, NaN, NaN, 0.080, 0.034, 21, 0.012, 1.187, NaN,  NaN,  NaN,  NaN 

我想读它,因为NaN值的问题。如果文件是一个csv文件(昏迷分离),我没有问题,但它有空格。当我读到它时使用:

df = pd.read_csv(file,index_col=None, header=None) 

很明显,带有NaN的列被读为字符串,因为空格。如果空间具有相同的维度,我的问题很容易。我可以使用:

df = pd.read_csv(file,index_col=None, header=None, na_values = " NaN") 

并解决了问题,但有不同的空格的列。其中一些在NaN之前有4个空间,其他的有6个,等等。

所以,我的问题是:是否有一个正则表达式指定na_values类似na_values = "\s+ NaN"

+1

为什么不使用正则表达式*分隔符*,比如'sep =“,\ s +”'? – BrenBarn

+1

或者,您可以使用'delim_whitespace = True'或'skipinitialspace = True'参数 – MaxU

+0

@BrenBam skipinitialspace = True正常工作,谢谢。但是sep =“,\ s +”不起作用 – nandhos

回答

0

试试这个:

df = pd.read_csv(engine='python', index_col=None, sep=',\s*', header=None) 

解析引擎设为python避免警告当您使用正则表达式作为分隔符你。