解析数据

我有数据看起来像下面的文件a.dat：解析数据

01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g

我希望把它们解析成三列：时间表，浮点数，字符串（无或g）

我曾尝试：

df=pd.read_csv('a.dat',sep='  | ',engine='python')

，其与4列结束了：日期，时间，浮动和g

df=pd.read_csv('a.dat',sep='  | (g)',engine='python')

其给出5列与第1列和4的NaN

有没有更好的方式来创建没有任何后处理的datafram？

来源

2016-07-25 Chenming Zhang

您可以使用read_csv：

import pandas as pd 
import io 

temp=u'''01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g''' 
#after testing replace io.StringIO(temp) to filename 
df = pd.read_csv(io.StringIO(temp), 
       sep='\s+', 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

或者：

import pandas as pd 
import io 

temp=u'''01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g''' 
#after testing replace io.StringIO(temp) to filename 
df = pd.read_csv(io.StringIO(temp), 
       delim_whitespace=True, 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

解决方案与read_fwf：

import pandas as pd 
import io 

temp=u'''01/Jul/2016 00:05:09  8438.2 
01/Jul/2016 00:05:19  8422.4 g''' 
#after testing replace io.StringIO(temp) to filename 
df = pd.read_fwf(io.StringIO(temp), 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

你也可以指定列的宽度：

df = pd.read_fwf(io.StringIO(temp), 
       fwidths = [20,12,2], 
       names=['date','time','float','string'], 
       parse_dates=[['date','time']]) 
print (df) 
      date_time float string 
0 2016-07-01 00:05:09 8438.2 NaN 
1 2016-07-01 00:05:19 8422.4  g

来源

2016-07-25 06:49:56 jezrael

回答

相关问题