2017-01-12 121 views
-1

我有一个数据框,包含一整年的小时数据。我想计算每月平均值并在时间序列图中显示它们。我有一个变量是NO2值。我想在熊猫中创建一个月度时间序列

#Cleaning data 
ck_2000 = pd.read_csv('2000-CamdenKerbside.csv', header=0,skiprows=4,usecols=range(0,3),skipfooter = 1, na_values = 'No data',engine = 'python') 
colnames = ['Date', 'Time', 'NO2'] 
ck_2000.columns = colnames 

#Reformat date/time 
ck_2000.Time.replace(to_replace = '24:00:00', value = '00:00:00', inplace = True) 
dtw = pd.to_datetime(ck_2000.Date + ck_2000.Time,format='%d/%m/%Y%H:%M:%S') 
ck_2000.index = dtw 

#Index dataframe by date 
firstDate = ck_2000.index[0] 
lastDate = ck_2000.index[len(ck_2000.Date) - 1] 
ck2000 = ck_2000.reindex(index=pd.date_range(start = firstDate, end =lastDate, freq = '1H'), fill_value= None) 

#Change data type to float 
ck2000['NO2'] = ck2000['NO2'].dropna().astype('int64') 

#Interpolation 
ck_2000_int = ck_2000.interpolate() 

#df's for all months 
ck_2000_jan = ck_2000_int['2000-01'] 
ck_2000_feb = ck_2000_int['2000-02'] 
ck_2000_mar = ck_2000_int['2000-03'] 
ck_2000_apr = ck_2000_int['2000-04'] 
ck_2000_may = ck_2000_int['2000-05'] 
ck_2000_jun = ck_2000_int['2000-06'] 
ck_2000_jul = ck_2000_int['2000-07'] 
ck_2000_aug = ck_2000_int['2000-08'] 
ck_2000_sept = ck_2000_int['2000-09'] 
ck_2000_oct = ck_2000_int['2000-10'] 
ck_2000_nov = ck_2000_int['2000-11'] 
ck_2000_dec = ck_2000_int['2000-12'] 

回答

0

,你应该能够使用resample
请看下面的例子

tidx = pd.date_range('2000-01-01', '2000-12-31 23:00', freq='H') 
ck_2000_int = pd.DataFrame(dict(NO2=np.random.randn(len(tidx))), tidx) 

ck_2000_int.resample('M').mean().plot() 

enter image description here

+0

我碰到下面的错误 - “未知的时间字符串格式,无法解析:NO2” – bootz123