2017-03-29 178 views
-1

我有数据帧熊猫:GROUPBY一些数据

datetime city state country shape duration (seconds) duration (hours/min) comments date posted latitude longitude 
10/10/1949 20:30 san marcos tx us cylinder 2700 45 minutes This event took place in early fall around 1949-50. It occurred after a Boy Scout meeting in the Baptist Church. The Baptist Church sit 4/27/2004 29.8830556 -97.9411111 
10/10/1949 21:00 lackland afb tx  light 7200 1-2 hrs 1949 Lackland AFB&#44 TX. Lights racing across the sky & making 90 degree turns on a dime. 12/16/2005 29.38421 -98.581082 
10/10/1955 17:00 chester (uk/england)  gb circle 20 20 seconds Green/Orange circular disc over Chester&#44 England 1/21/2008 53.2 -2.916667 
10/10/1956 21:00 edna tx us circle 20 1/2 hour My older brother and twin sister were leaving the only Edna theater at about 9 PM&#44...we had our bikes and I took a different route home 1/17/2004 28.9783333 -96.6458333 
10/10/1960 20:00 kaneohe hi us light 900 15 minutes AS a Marine 1st Lt. flying an FJ4B fighter/attack aircraft on a solo night exercise&#44 I was at 50&#44000&#39 in a "clean" aircraft (no ordinan 1/22/2004 21.4180556 -157.8036111 

我尝试state 做组我用

result = df.groupby("state").\ 
    agg({"state": pd.Series.nunique, "duration (seconds)": np.sum}).\ 
    rename(columns={"state": "frequency", "duration (seconds)": "whole time"}).\ 
    reset_index() 

但它返回错误TypeError: must be str, not float。 我尝试转换duration (seconds),但它返回 duration (seconds)。 我该如何检查这个问题?

+0

什么实际引发错误? (“state”)。agg({“state”:pd.Series.nunique})'工作吗? (即,你的groupby的一半) – Stael

+0

我们不知道错误来自哪里。整个错误兄弟 –

回答

0

做这样的事情:

# Group df by df.state, then apply a sum lambda function to df.duration(seconds) 
df.groupby('state')['duration (seconds)'].apply(lambda x:x.mean()) 

或者,如果你想有一个滚动的总和:

df.groupby('state')['duration (seconds)'].apply(lambda x:x.rolling(center=False,window=2).sum())