2013-07-27 37 views
3
completed    deadline 
15-07-2013 23:10 15-07-2013 23:15 
16-07-2013 00:20 16-07-2013 00:15 
16-07-2013 00:20 16-07-2013 00:15 
16-07-2013 21:04 16-07-2013 21:30 
16-07-2013 21:58 16-07-2013 22:00 
16-07-2013 23:21 16-07-2013 23:15 
16-07-2013 23:21 16-07-2013 23:15 
17-07-2013 00:19 17-07-2013 00:15 
17-07-2013 00:19 17-07-2013 00:15 
17-07-2013 21:18 17-07-2013 21:30 
17-07-2013 22:07 17-07-2013 22:00 

当我说data['completed'] - data['deadline']我得到;使用python熊猫计算时差和打印到csv

-1 day, 23:55:00 # on time 
     0:05:00 
     0:05:00 
-1 day, 23:34:00 # on time 
-1 day, 23:58:00 # on time 
     0:06:00 
     0:06:00 
     0:04:00 
     0:04:00 
-1 day, 23:48:00 # on time 
     0:07:00 

但是当我做data['time_delay'] = data['completed'] - data['deadline']和打印data['time_delay']我得到的;

-300000000000 
300000000000 
300000000000 
-1560000000000 
-120000000000 
360000000000 
360000000000 
240000000000 
240000000000 
-720000000000 
420000000000 

当输出打印到csv时,我会得到相同的结果。

我如何:

  1. 处理这个输出?

  2. 以'分钟'格式打印输出到csv?

  3. 句柄“准时”输出?

回答

2
>>> data = pd.read_csv('1.csv', parse_dates=[0,1]) 
>>> data['time_delay'] = data['completed'] - data['deadline'] 
>>> print data['time_delay'] 
0 -00:05:00 
1 00:05:00 
2 00:05:00 
3 -00:26:00 
4 -00:02:00 
Name: time_delay, dtype: timedelta64[ns] 
>>> data.to_csv(sys.stdout) 
,completed,deadline,time_delay 
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-300000000000 
1,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000 
2,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000 
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-1560000000000 
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-120000000000 
>>> data['time_delay'] = data['time_delay'].apply(pd.lib.repr_timedelta64) 
>>> data.to_csv(sys.stdout) 
,completed,deadline,time_delay 
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-00:05:00 
1,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00 
2,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00 
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-00:26:00 
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-00:02:00 

pandas.lib.repr_timedelta64不无证。所以这个代码可能会在未来打破。 (我用熊猫0.11.0)

+0

谢谢!正在为此而挣扎! – richie

+0

仅供参考,反向操作尚未实现(读取timedelta列);此列将被读为对象dtype – Jeff

1

试试这个:

def func(x,y): 
    if x > y: 
    return 'delayed by ' + str(((x-y).seconds//60)%60) + ' minutes' 
    else: 
    return 'on time by ' + str(((y-x).seconds//60)%60) + ' minutes' 


    data["ontime"] = data.apply(lambda row: func(row["completed"], row["deadline"]), axis=1) 

这给:

completed     deadline    ontime 
0 2013-07-15 23:10:00 2013-07-15 23:15:00  on time by 5 minutes 
1 2013-07-16 00:20:00 2013-07-16 00:15:00  delayed by 5 minutes 
2 2013-07-16 00:20:00 2013-07-16 00:15:00  delayed by 5 minutes 
3 2013-07-16 21:04:00 2013-07-16 21:30:00  on time by 26 minutes 
4 2013-07-16 21:58:00 2013-07-16 22:00:00  on time by 2 minutes 
5 2013-07-16 23:21:00 2013-07-16 23:15:00  delayed by 6 minutes 
6 2013-07-16 23:21:00 2013-07-16 23:15:00  delayed by 6 minutes 
7 2013-07-17 00:19:00 2013-07-17 00:15:00  delayed by 4 minutes 
8 2013-07-17 00:19:00 2013-07-17 00:15:00  delayed by 4 minutes 
9 2013-07-17 21:18:00 2013-07-17 21:30:00  on time by 12 minutes 
10 2013-07-17 22:07:00 2013-07-17 22:00:00  delayed by 7 minutes 
+0

没有看到接受的答案,这给出了更好的解决方案。 –

+0

不错的答案,但是当我尝试你的代码时遇到以下错误; ''不支持的操作数类型(s)为 - :'str'和'str'“,u'发生在索引0' – richie

+1

我试过这个,它工作。 'data [“ontime”] = data.apply(lambda row:func(pd.Timestamp(row [“completed”]),pd.Timestamp(row [“deadline”])),axis = 1)' – richie