我有一个很大的数据框,从中获取我需要的数据与groupby
。我需要从新数据框的索引中获取几个单独的列。原来的数据帧的将索引拆分为熊猫中的单独列
部分看起来是这样的:
code place vl year week
0 111.0002.0056 region1 1 2017 29
1 112.6500.2285 region2 1 2017 31
2 112.5600.6325 region2 1 2017 30
3 112.5600.6325 region2 1 2017 30
4 112.5600.8159 region2 1 2017 30
5 111.0002.0056 region2 1 2017 29
6 111.0002.0056 region2 1 2017 30
7 111.0002.0056 region2 1 2017 28
8 112.5600.8159 region3 1 2017 31
9 112.5600.8159 region3 1 2017 28
10 111.0002.0114 region3 1 2017 31
....
应用groupby
后,它看起来像这样(代码:df_test1 = df_test.groupby(['code' , 'year', 'week', 'place'])['vl'].sum().unstack(fill_value=0)
):
place region1 region2 region3 region4 index1
code year week
111.0002.0006 2017 29 0 3 0 0 (111.0002.0006, 2017, 29)
30 0 7 0 0 (111.0002.0006, 2017, 30)
111.0002.0018 2017 29 0 0 0 0 (111.0002.0018, 2017, 29)
111.0002.0029 2017 30 0 0 0 0 (111.0002.0029, 2017, 30)
111.0002.0055 2017 28 0 33 0 8 (111.0002.0055, 2017, 28)
29 1 155 2 41 (111.0002.0055, 2017, 29)
30 0 142 1 39 (111.0002.0055, 2017, 30)
31 0 31 0 13 (111.0002.0055, 2017, 31)
111.0002.0056 2017 28 9 36 0 4 (111.0002.0056, 2017, 28)
29 20 75 2 37 (111.0002.0056, 2017, 29)
30 17 81 2 33 (111.0002.0056, 2017, 30)
....
我救指数在单独的列index1
(代码:df_test1['index1'] = df_test1.index
) 我需要走出列index1
三个独立的列code
,year
和week
。
结果应该是这样的:
region1 region2 region3 region4 code year week
0 3 0 0 111.0002.0006 2017 29
0 7 0 0 111.0002.0006 2017 30
0 0 0 0 111.0002.0018 2017 29
0 0 0 0 111.0002.0029 2017 30
0 33 0 8 111.0002.0055 2017 28
1 155 2 41 111.0002.0055 2017 29
0 142 1 39 111.0002.0055 2017 30
0 31 0 13 111.0002.0055 2017 31
....
我会为任何建议感激!