2017-08-24 201 views
1

我有这样的数据帧:的Python /大熊猫 - 计算比

bal: 

      year id unit period   Revenues Ativo Não-Circulante \ 
business_id                  
9564   2012 302 dsada anual  5964168.52   10976013.70 
9564   2011 303 dsada anual  5774707.15   10867868.13 
2361   2013 304 dsada anual  3652575.31   6608468.52 
2361   2012 305 dsada anual76.15   6027066.03 
2361   2011 306 dsada anual  3858137.49   9733126.02 
2369   2012 307 dsada anual   351373.66   9402830.89 
8104   2012 308 dsada anual  3503226.02   6267307.01 
... 

我想创建一个名为“成长”栏目。这将是:

(收入从去年和今年/收入) - 1

数据帧应该是这样的:

   year id unit period   Revenues    Growth \ 
business_id                  
9564   2012 302 dsada anual  5964168.52    0.0328 
9564   2011 303 dsada anual  5774707.15     NaN 
2361   2013 304 dsada anual  3652575.31     10.37 
2361   2012 305 dsada anual76.15     -0.91 
2361   2011 306 dsada anual  3858137.49     NaN 
2369   2012 307 dsada anual   351373.66     NaN 
8104   2012 308 dsada anual  3503226.02     NaN 
... 

我怎么能这样做呢?

+1

您需要创建年度增加/减少一个,然后使用新年+/- 1列和ID将收入加入自身,以创建下一个/去年的收入。此后的计算应该是微不足道的。 – n8sty

+0

@ n8sty这个解决方案和你想象的一样明显。虽然在这个问题上没有很好地阐述,但收入的年增长率是以'business_id'为基础的。 – Alexander

回答

1

我假设你的数据框被命名为df。首先休息索引,以便business_id是一列,然后在year上对结果进行排序。现在将数据框分组在business_id上,并将结果转换为收入的百分比变化。最后,通过索引来获取原始订单。

df2 = df.reset_index().sort_values(['year']) 
df2 = (
    df2 
    .assign(Growth=df2.groupby(['business_id'])['Revenues'].transform(
     lambda group: group.pct_change())) 
    .sort_index() 
) 
>>> df2 
business_id year id unit period Revenues Ativo Não-Circulante Growth 
0 9564 2012 302 dsada anual 5964168.52 10976013.70   0.032809 
1 9564 2011 303 dsada anual 5774707.15 10867868.13    NaN 
2 2361 2013 304 dsada anual 3652575.31 6608468.52   10.376041 
3 2361 2012 305 dsada anual76.15 6027066.03   -0.916779 
4 2361 2011 306 dsada anual 3858137.49 9733126.02     NaN 
5 2369 2012 307 dsada anual 351373.66 9402830.89     NaN 
6 8104 2012 308 dsada anual 3503226.02 6267307.01     NaN 

我觉得你在你的预期输出有一个错误:

5964168.52/5774707.15 - 1 = 0.0328 # vs. 0.16 shown. 
+0

伟大的解决方案。事实上,我在这个问题上错误地计算了它。我将编辑并修复它。谢谢@亚历山大 – abutremutante

0

您需要通过groupby值按年份循环“groupby”year和“sort_values”来计算增长,将增长存储在列表中并转换为numpy.array(增长),并添加到数据框中。

#df is your dataframe 
group = df.groupby(df['year']) 
R = {} #Store Revenue in dictionary 
y = [] #make list of year to append years 
for year, values in group: 
    R[year] = values[Revenues] 
    y.append(year) 
g = [] #create list of growth 
for i, eyear in enumerate(y): 
    try: 
     g.append(eyear[i]/eyear[i+1]) 
    except: 
     pass 
df['Growth'] = numpy.array(g) #Create numpy array and append to df 
0

看起来像你需要一个groupby('business_id'),然后shift拿到去年的收入。保存关闭新的一列,然后做比,就像这样:

df.reset_index(inplace=True) # You might have to do this because it looks like your index is 'business_id' 

df['Previous Revenues'] = df.sort_values('year').groupby('business_id')['Revenues'].shift(1) 
df['Growth'] = df['Revenues']/df['Previous Revenues'] - 1 

如果你想,你并不需要保存新列,但该行变得有点凌乱:

df['Growth'] = df['Revenues']/df.sort_values('year').groupby('business_id')['Revenues'].shift(1) - 1