2015-12-28 211 views
1

我是python熊猫的新手。 任何帮助将非常感激Python熊猫的平均值和加权平均值

这是我的原始数据:

  Feed Close Sector Market_Cap 
Date 
2015-09-18 A 5.60 Property 50  
2015-09-21 A 5.60 Property 20  
2015-09-23 A 5.60 Property 30  
2015-09-18 ABC 0.67 Property 50  
2015-09-21 ABC 0.66 Property 80  
2015-09-18 DA 0.67 Mining 65  
2015-09-21 KK 1.66 Mining 80  

什么,我想获得的是:

1创建一个新列的呼叫平均来计算平均市值每个饲料。

2查找加权平均值。

This is what I want: 
     Feed Close Sector Market_Cap Mean Sector_WeightedAvg 
Date 
2015-09-18 A 5.60 Property 50   33.33  33.33/(33.33+65) 
2015-09-21 A 5.60 Property 20   33.33  33.33/(33.33+65) 
2015-09-23 A 5.60 Property 30   33.33  33.33/(33.33+65) 
2015-09-18 ABC 0.67 Property 50   65   65/(33.33+65) 
2015-09-21 ABC 0.66 Property 80   65   65/(33.33+65) 
2015-09-18 DA 0.67 Mining 65   62   62/(62+80) 
2015-09-21 KK 1.66 Mining 80   80   80/(62+80) 

这是我目前的平均代码,我得到楠:

df3= pd.DataFrame(df3) 
df3['Mean'] = df3.groupby(by=['Sector'])[ Market_Cap].mean() 

     Feed Close Sector Market_Cap Mean 
Date 
2015-09-18 A 5.60 Property 50   NaN  
2015-09-21 A 5.60 Property 20   NaN  
2015-09-23 A 5.60 Property 30   NaN  
2015-09-18 ABC 0.67 Property 50   NaN    

和加权平均代码:

df2['WeightedAverage'] =df3[ Market_Cap].value /df3['Mean'].value 

我得到了错误:

AttributeError: 'Series' object has no attribute 'value'

+1

'这给error' - 什么错误?我们可以得到回溯? – cel

+1

您的数据框中没有“Value”列,但您可以在代码中引用它。 –

+0

恩,我已转贴。它应该是Market_Cap。我仍然得到了同样的错误 – Dusty

回答

1

IIUC您可以使用transformmean

Weighted Average是通过柱的唯一值的总和除以柱MeanMeandf3通过Sector列组。

print df3 
      Feed Close Sector Market_Cap 
Date           
2015-09-18 A 5.60 Property   50 
2015-09-21 A 5.60 Property   20 
2015-09-23 A 5.60 Property   30 
2015-09-18 ABC 0.67 Property   50 
2015-09-21 ABC 0.66 Property   80 
2015-09-18 DA 0.67 Mining   65 
2015-09-21 KK 1.66 Mining   80 

df3['Mean'] = df3.groupby(by=['Feed'])['Market_Cap'].transform('mean') 
df3['WeightedAverage'] = df3['Mean']/df3.groupby(by=['Sector'])[ 'Mean'].transform(lambda x: sum(x.unique())) 
print df3 
      Feed Close Sector Market_Cap  Mean WeightedAverage 
Date                  
2015-09-18 A 5.60 Property   50 33.333333   0.338983 
2015-09-21 A 5.60 Property   20 33.333333   0.338983 
2015-09-23 A 5.60 Property   30 33.333333   0.338983 
2015-09-18 ABC 0.67 Property   50 65.000000   0.661017 
2015-09-21 ABC 0.66 Property   80 65.000000   0.661017 
2015-09-18 DA 0.67 Mining   65 65.000000   0.448276 
2015-09-21 KK 1.66 Mining   80 80.000000   0.551724 
+0

,但不'sum(x.unique())'假设每个均值是一个唯一值?如果不同部门有多个相等的平均值,会怎么样? –

+0

这是可能的,但在这个样本工作我的方法,因为每个部门没有重叠'饲料'。列'平均值'取决于'Feed'列。 – jezrael

0

尝试变换的组合( '和'),平均

In [5]: df 
Out[5]: 
    Close Feed Market_Cap Sector 
0 5.60 A   50 Property 
1 5.60 A   20 Property 
2 5.60 A   30 Property 
3 0.67 ABC   50 Property 
4 0.66 ABC   80 Property 
5 0.67 DA   65 Mining 
6 1.66 KK   80 Mining 

In [6]: g = df.groupby(['Sector', 'Feed']) 

..

In [7]: c = g.Market_Cap.mean() 

In [8]: c 
Out[8]: 
Sector Feed 
Mining DA  65.000000 
      KK  80.000000 
Property A  33.333333 
      ABC  65.000000 
Name: Market_Cap, dtype: float64 

In [9]: d = c.groupby(level=0).transform('sum') 

In [10]: d 
Out[10]: 
Sector Feed 
Mining DA  145.000000 
      KK  145.000000 
Property A  98.333333 
      ABC  98.333333 
dtype: float64 

..

In [11]: df['Mean'] = df.apply(lambda x: c[x.Sector, x.Feed], axis=1) 

In [12]: df['Weighted_Avg'] = df.apply(lambda x: c[x.Sector, x.Feed]/d[x.Sector, x.Feed], axis=1) 

In [13]: df 
Out[13]: 
    Close Feed Market_Cap Sector  Mean Weighted_Avg 
0 5.60 A   50 Property 33.333333  0.338983 
1 5.60 A   20 Property 33.333333  0.338983 
2 5.60 A   30 Property 33.333333  0.338983 
3 0.67 ABC   50 Property 65.000000  0.661017 
4 0.66 ABC   80 Property 65.000000  0.661017 
5 0.67 DA   65 Mining 65.000000  0.448276 
6 1.66 KK   80 Mining 80.000000  0.551724