2017-08-18 23 views
6

我有一个数据框,每年的课程名称。我需要找到个月时间,从今年开始到2016年大熊猫向后填充增量12个月

from io import StringIO 

import pandas as pd 

u_cols = ['page_id','web_id'] 
audit_trail = StringIO(''' 
year_id | web_id 
2012|efg 
2013|abc 
2014| xyz 
2015| pqr 
2016| mnp 
''') 

df11 = pd.read_csv(audit_trail, sep="|", names = u_cols ) 

如何在新列添加个月从最高开始(即底部像bfill?)

最后的数据帧会看像这样...

u_cols = ['page_id','web_id' , 'months'] 
audit_trail = StringIO(''' 
year_id | web_id | months 
2012|efg | 60 
2013|abc | 48 
2014| xyz | 36 
2015| pqr | 24 
2016| mnp | 12 
''') 

df12 = pd.read_csv(audit_trail, sep="|", names = u_cols ) 

有些答案并不认为可以有多种课程。更新样本数据...

from io import StringIO 

import pandas as pd 

u_cols = ['course_name','page_id','web_id'] 
audit_trail = StringIO(''' 
course_name| year_id | web_id 
a|2012|efg 
a|2013|abc 
a|2014| xyz 
a|2015| pqr 
a|2016| mnp 
b|2014| xyz 
b|2015| pqr 
b|2016| mnp 

''') 

df11 = pd.read_csv(audit_trail, sep="|", names = u_cols ) 

回答

5
>>> df11.assign(months=df11.groupby('course_name').year_id.transform(
     lambda years: range(len(years) * 12, 0, -12))) 
    course_name year_id web_id months 
0   a  2012 efg  60 
1   a  2013 abc  48 
2   a  2014 xyz  36 
3   a  2015 pqr  24 
4   a  2016 mnp  12 
5   b  2014 xyz  36 
6   b  2015 pqr  24 
7   b  2016 mnp  12 
+0

不错!我忘记了'变换'不需要索引。 – piRSquared

4

您可以使用transformarange

df11['months'] = df11.groupby('course_name')['year_id'] \ 
        .transform(lambda x: np.arange(len(x)*12, 0, -12)) 
print (df11) 
    course_name year_id web_id months 
0   a  2012  efg  60 
1   a  2013  abc  48 
2   a  2014  xyz  36 
3   a  2015  pqr  24 
4   a  2016  mnp  12 
5   b  2014  xyz  36 
6   b  2015  pqr  24 
7   b  2016  mnp  12 
7
df11.assign(
    months=df11.groupby('course_name').apply(
     lambda x: pd.Series(np.repeat([12], len(x)).cumsum()[::-1]) 
    ).values 
) 

    course_name year_id web_id months 
0   a  2012 efg  60 
1   a  2013 abc  48 
2   a  2014 xyz  36 
3   a  2015 pqr  24 
4   a  2016 mnp  12 
5   b  2014 xyz  36 
6   b  2015 pqr  24 
7   b  2016 mnp  12 

所有功劳都@Alexander@jezrael提醒我们的一个很酷的特点transform
考虑到,我可以我的回答改变

df11.assign(months=df11.groupby('course_name').year_id.transform(
    lambda x: np.repeat([12], len(x)).cumsum()[::-1] 
)) 

    course_name year_id web_id months 
0   a  2012 efg  60 
1   a  2013 abc  48 
2   a  2014 xyz  36 
3   a  2015 pqr  24 
4   a  2016 mnp  12 
5   b  2014 xyz  36 
6   b  2015 pqr  24 
7   b  2016 mnp  12