2016-07-15 104 views
0

我想写一个脚本来计算30分钟的平均辐射(即12点,12点30分,1点......)。计算30分钟平均值后,我需要将数据分成季节(DJF)(MAM)(JJA)(SON)。等于= -99999的值应该省略。计算Python中30分钟的平均值和季节平均值?

这是前几行数据。这是一个非常大的文件,它有很多年。

DATE month day year EST Direct NIP Diffuse PSP (sband corr) 4/1/2004 4 1 2004 5:55 0.01967 1.5687 4/1/2004 4 1 2004 6:00 0.2295 5.3946 4/1/2004 4 1 2004 6:05 0.59015 13.0295 4/1/2004 4 1 2004 6:10 0.78686 23.0043 4/1/2004 4 1 2004 6:15 0.60982 20.827 4/1/2004 4 1 2004 6:20 0.80655 23.199 4/1/2004 4 1 2004 6:25 0.81309 26.951 4/1/2004 4 1 2004 6:30 0.77375 31.0062 4/1/2004 4 1 2004 6:35 0.55081 35.04 4/1/2004 4 1 2004 6:40 0.24262 41.1042 4/1/2004 4 1 2004 6:45 0.39999 46.6218 4/1/2004 4 1 2004 6:50 0.26229 52.7591 4/1/2004 4 1 2004 6:55 0.26885 67.9498

我如何能去这个任何想法?感谢您的支持。

编辑:这是我的代码到目前为止。它一直计算所有辐射。请注意,这是业余的,因为我正在教自己如何编写代码。谢谢

import csv 
import openpyxl 
import matplotlib as mpl 
import numpy as np 
import matplotlib.pyplot as plt 
import matplotlib.dates as mdates 
import pandas as pd 
from datetime import datetime 

x = [datetime(year = 2004, month = 4, day = 1), 
    datetime(year = 2014, month = 11, day = 18)] 
y = [] 
x2 = [] 
y2 = [] 

with open('tenyeardata.csv', 'r') as csvfile: 
    data = csv.reader(csvfile) 

    firstline = True 
    for row in data: 
     if firstline: #skip first line 
      firstline = False 
      continue 

     x.append(int(row[1])) 
     y.append(float(row[5])) 
     x2.append(int(row[3])) 
     y2.append(float(row[6])) 


fig = plt.figure() 

ax1 = fig.add_subplot(111) 

ax1.set_title("North Carolina Radiation (Direct and Diffuse)")  
ax1.set_xlabel('time (hours)') 
ax1.set_ylabel('SW (W m-2)') 
print x[:10] 
print y[:10] 
ax1.plot(y, c='r', label='Direct') 
ax1.plot(y2, c='b', label = 'Diffuse') 
ax1.axis([-1, 568217, 0, 1100]) 
leg = ax1.legend() 
plt.axis([-1, 568217, 0, 1100]) 
plt.show() 
+0

你有任何代码向我们展示了这样我们可以看看? –

+0

我刚把我的代码放在原来的问题中。我希望它不会让你感到困惑! –

+0

我只是想弄清楚你想要绘制什么。你正在填充'x'和'x2',但从不使用它们。你是否希望在每一行中看到“直接”和“漫射”数据与时间的关系(例如,凌晨5:55,x值为4/1/2004的y值为0.01967)? –

回答

0

考虑计算您需要的剧情尺寸:小时日期/时间和季节。然后绘制运行groupby()平均聚合:

from io import StringIO 
import pandas as pd 
import numpy as np 
import time, datetime 

data = '''DATE,month,day,year,EST,Direct NIP,Diffuse PSP (sband corr) 
4/1/2004,4,1,2004,5:55,0.01967,1.5687 
4/1/2004,4,1,2004,6:00,0.2295,5.3946 
4/1/2004,4,1,2004,6:05,0.59015,13.0295 
4/1/2004,4,1,2004,6:10,0.78686,23.0043 
4/1/2004,4,1,2004,6:15,0.60982,20.827 
4/1/2004,4,1,2004,6:20,0.80655,23.199 
4/1/2004,4,1,2004,6:25,0.81309,26.951 
4/1/2004,4,1,2004,6:30,0.77375,31.0062 
4/1/2004,4,1,2004,6:35,0.55081,35.04 
4/1/2004,4,1,2004,6:40,0.24262,41.1042 
4/1/2004,4,1,2004,6:45,0.39999,46.6218 
4/1/2004,4,1,2004,6:50,0.26229,52.7591 
4/1/2004,4,1,2004,6:55,0.26885,67.9498''' 

df = pd.read_csv(StringIO(data)) 

# ADDED DATE/TIME FIELDS 
df['DATE'] = pd.to_datetime(df['DATE'] + ' ' + df['EST'], format='%m/%d/%Y %H:%M') 
df['MONTH'] = df['DATE'].dt.month 

# EVERY HALF HOUR BLOCKS 
df['HALF_HOUR_DATE'] = df['DATE'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour, 30*(dt.minute // 30))) 
df['HALF_HOUR_TIME'] = df apply(lambda x: x.strftime('%H:%M')) 

# SEASON CONDITIONAL CALCULATION 
df['SEASON'] = np.where(df['MONTH'].isin([12,1,2]), 'DJF', 
         np.where(df['MONTH'].isin([3,4,5]), 'MAM', 
           np.where(df['MONTH'].isin([6,7,8]), 'JJA', 
              np.where(df['MONTH'].isin([9,10,11]), 'SON', None)))) 

# AGGREGATE DATA   
aggdf = df[['SEASON', 'HALF_HOUR_DATE', 'Direct NIP', 'Diffuse PSP (sband corr)']].\ 
       groupby(['SEASON','HALF_HOUR_DATE']).mean() 

输出

更新数据框

#     DATE month day year EST Direct NIP Diffuse PSP (sband corr) MONTH  HALF_HOUR_DATE HALF_HOUR_TIME SEASON 
# 0 2004-04-01 05:55:00  4 1 2004 5:55  0.01967     1.5687  4 2004-04-01 05:30:00   05:30 MAM 
# 1 2004-04-01 06:00:00  4 1 2004 6:00  0.22950     5.3946  4 2004-04-01 06:00:00   06:00 MAM 
# 2 2004-04-01 06:05:00  4 1 2004 6:05  0.59015     13.0295  4 2004-04-01 06:00:00   06:00 MAM 
# 3 2004-04-01 06:10:00  4 1 2004 6:10  0.78686     23.0043  4 2004-04-01 06:00:00   06:00 MAM 
# 4 2004-04-01 06:15:00  4 1 2004 6:15  0.60982     20.8270  4 2004-04-01 06:00:00   06:00 MAM 
# 5 2004-04-01 06:20:00  4 1 2004 6:20  0.80655     23.1990  4 2004-04-01 06:00:00   06:00 MAM 
# 6 2004-04-01 06:25:00  4 1 2004 6:25  0.81309     26.9510  4 2004-04-01 06:00:00   06:00 MAM 
# 7 2004-04-01 06:30:00  4 1 2004 6:30  0.77375     31.0062  4 2004-04-01 06:30:00   06:30 MAM 
# 8 2004-04-01 06:35:00  4 1 2004 6:35  0.55081     35.0400  4 2004-04-01 06:30:00   06:30 MAM 
# 9 2004-04-01 06:40:00  4 1 2004 6:40  0.24262     41.1042  4 2004-04-01 06:30:00   06:30 MAM 
# 10 2004-04-01 06:45:00  4 1 2004 6:45  0.39999     46.6218  4 2004-04-01 06:30:00   06:30 MAM 
# 11 2004-04-01 06:50:00  4 1 2004 6:50  0.26229     52.7591  4 2004-04-01 06:30:00   06:30 MAM 
# 12 2004-04-01 06:55:00  4 1 2004 6:55  0.26885     67.9498  4 2004-04-01 06:30:00   06:30 MAM 

聚集GROUPBY数据帧

#        Direct NIP Diffuse PSP (sband corr) 
# SEASON HALF_HOUR_DATE           
# MAM 2004-04-01 05:30:00 0.019670     1.568700 
#  2004-04-01 06:00:00 0.639328     18.734233 
#  2004-04-01 06:30:00 0.416385     45.746850