2016-04-04 53 views
1

我有一个熊猫据帧,df它看起来像这样:的Unix时间戳转换使用熊猫问题

 _sent_time_stamp distance duration duration_in_traffic Orig_lat 
0   1456732800  1670  208     343 51.441092 

我想时代的时间值(_sent_time_stamp)转换成两列,一个日期和一个与小时。

我定义了两个功能:

def date_convert(time): 
    return time.date() 

def hour_convert(time): 
    return time.hour() 

然后我用演算应用这些功能,创建2个新列。

df['date'] = Goo_results.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 

df['hour'] = Goo_results.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 

日期列的工作原理,但小时不起作用。我看不出为什么!

TypeError: ("'int' object is not callable", u'occurred at index 0') 
+1

可以只是转换整个列'DF [ '小时'] = pd.to_datetime(DF [ '_ sent_time_stamp'],单元= 'S')dt.hour'。 – EdChum

回答

1

您可以删除()下一个hour

def date_convert(time): 
    return time.date() 

def hour_convert(time): 
    return time.hour #remove() 

df['date'] = df.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 
df['hour'] = df.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)  
print df 
    _sent_time_stamp distance duration duration_in_traffic Orig_lat \ 
0  1456732800  1670  208     343 51.441092 

     date hour 
0 2016-02-29  8 

但更好更快的是使用dt.datedt.hour

dat = pd.to_datetime(df['_sent_time_stamp'], unit='s') 
df['date'] = dat.dt.date 
df['hour'] = dat.dt.hour 
print df 
    _sent_time_stamp distance duration duration_in_traffic Orig_lat \ 
0  1456732800  1670  208     343 51.441092 

     date hour 
0 2016-02-29  8 

时序

In [20]: %timeit new(df1) 
1000 loops, best of 3: 827 µs per loop 

In [21]: %timeit lamb(df) 
The slowest run took 4.40 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.13 ms per loop 

代码:

df1 = df.copy() 

def date_convert(time): 
    return time.date() 

def hour_convert(time): 
    return time.hour 


def lamb(df):  
    df['date'] = df.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1) 
    df['hour'] = df.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)  
    return df 

def new(df): 
    dat = pd.to_datetime(df['_sent_time_stamp'], unit='s') 
    df['date'] = dat.dt.date 
    df['hour'] = dat.dt.hour 
    return df 

print lamb(df)  
print new(df1)