2017-10-16 35 views
1

我有一个熊猫数据帧,例如:规范化使用方程和日均大熊猫据帧列

df = pd.DataFrame({ 
      'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'), 
      'val1' : np.random.random(25)*300, 
      'val2' : np.random.random(25)*30}) 

df.set_index('time', inplace=True) 

和值的数据帧:

real_values = pd.DataFrame({ 
    'day' : [18, 19, 20], 
    'values' : [500, 600, 700]}) 

我想归列val1的值使用等式如:

new_value = old_value*real_value_that_day/daily_average 

这是,每个值乘以fracti在当天的实际价值与日平均值之间。

我尝试使用.map,但我不能在数据框中包含index.day条件。我尝试过使用groupby(df.index.day),但我不知道是否会得到最终结果。

非常感谢

回答

2

我想你需要:

np.random.seed(45) 
df = pd.DataFrame({ 
      'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'), 
      'val1' : np.random.random(25)*300, 
      'val2' : np.random.random(25)*30}) 

df.set_index('time', inplace=True) 

real_values = pd.DataFrame({ 
    'day' : [18, 19, 20], 
    'values' : [500, 600, 700]}) 

#map real_values to Series with same length as df by days 
a = pd.Series(df.index.day, index=df.index).map(real_values.set_index('day')['values']) 
print (a.head()) 
time 
2017-07-18 00:00:00 500.0 
2017-07-18 03:00:00 500.0 
2017-07-18 06:00:00 500.0 
2017-07-18 09:00:00 500.0 
2017-07-18 12:00:00 500.0 
Name: time, dtype: float64 

#original multiple by Series a and divide by daily average by transform 
df1 = df.mul(a, 0).div(df.groupby(df.index.day).transform('mean')) 
print (df1) 
          val1   val2 
time           
2017-07-18 00:00:00 1307.171491 403.372865 
2017-07-18 03:00:00 726.330473 851.356196 
2017-07-18 06:00:00 371.987469 77.497641 
2017-07-18 09:00:00 102.153227 959.768694 
2017-07-18 12:00:00 587.453074 233.817177 
2017-07-18 15:00:00 624.907891 734.391568 
2017-07-18 18:00:00 64.131282 114.951326 
2017-07-18 21:00:00 215.865093 624.844533 
2017-07-19 00:00:00 120.686108 542.744066 
2017-07-19 03:00:00 653.014193 1116.500860 
2017-07-19 06:00:00 891.148297 333.591495 
2017-07-19 09:00:00 676.652432 610.715673 
2017-07-19 12:00:00 1031.182496 743.728715 
2017-07-19 15:00:00 489.559748 336.152862 
2017-07-19 18:00:00 643.545466 147.084368 
2017-07-19 21:00:00 294.211260 969.481959 
2017-07-20 00:00:00 1474.421809 404.910284 
2017-07-20 03:00:00 1016.785621 1078.311435 
2017-07-20 06:00:00 665.498098 589.809072 
2017-07-20 09:00:00 437.622829 122.931391 
2017-07-20 12:00:00 769.989526 1158.555013 
2017-07-20 15:00:00 169.891633 968.620184 
2017-07-20 18:00:00 342.854461 159.225353 
2017-07-20 21:00:00 722.936022 1117.637269 
2017-07-21 00:00:00   NaN   NaN 

详细信息:

print (df.groupby(df.index.day).transform('mean')) 
          val1  val2 
time          
2017-07-18 00:00:00 113.490638 14.427688 
2017-07-18 03:00:00 113.490638 14.427688 
2017-07-18 06:00:00 113.490638 14.427688 
2017-07-18 09:00:00 113.490638 14.427688 
2017-07-18 12:00:00 113.490638 14.427688 
2017-07-18 15:00:00 113.490638 14.427688 
2017-07-18 18:00:00 113.490638 14.427688 
2017-07-18 21:00:00 113.490638 14.427688 
2017-07-19 00:00:00 172.937287 13.491194 
2017-07-19 03:00:00 172.937287 13.491194 
2017-07-19 06:00:00 172.937287 13.491194 
2017-07-19 09:00:00 172.937287 13.491194 
2017-07-19 12:00:00 172.937287 13.491194 
2017-07-19 15:00:00 172.937287 13.491194 
2017-07-19 18:00:00 172.937287 13.491194 
2017-07-19 21:00:00 172.937287 13.491194 
2017-07-20 00:00:00 139.010896 16.081470 
2017-07-20 03:00:00 139.010896 16.081470 
2017-07-20 06:00:00 139.010896 16.081470 
2017-07-20 09:00:00 139.010896 16.081470 
2017-07-20 12:00:00 139.010896 16.081470 
2017-07-20 15:00:00 139.010896 16.081470 
2017-07-20 18:00:00 139.010896 16.081470 
2017-07-20 21:00:00 139.010896 16.081470 
2017-07-21 00:00:00 72.827447 2.008148 
+0

谢谢!这完美的作品:) – Pau

1

这应做到:

import pandas as pd 
import numpy as np 

df = pd.DataFrame({ 
      'time' : pd.date_range('2017-07-18 00:00:00', '2017-07-21 00:00:00', freq='3H'), 
      'val1' : np.random.random(25)*300, 
      'val2' : np.random.random(25)*30}) 

real_values = pd.DataFrame({ 
    'day' : [18, 19, 20], 
    'values' : [500, 600, 700]}) 

df['day'] = df['time'].apply(lambda x: x.day) 
df = df.merge(real_values, how='left', on='day') 

df['mean'] = df.groupby('day')['val1'].transform('mean') 
df['val1'] = df['val1'] * df['values']/df['mean'] 
df.set_index('time', inplace=True) 

你只是缺少21的映射。