2016-02-27 102 views
2

我有以下代码:Python的熊猫数据透视表按日期排序

data_df = pandas.read_csv(filename, parse_dates = True) 
groupings = np.unique(data_df[['Ind']]) 
for group in groupings: 
    data_df2 = data_df[data_df['Ind'] == group] 
    table = pandas.pivot_table(data_df2, values='Rev', index=['Ind', 'Month'], columns=['Type'], aggfunc=sum) 
    table = table.sort_index(ascending=[0, 0]) 
    print(table) 

我如何通过月份和年份(如排序枢轴“表”当我打印“表”我想DEC-14成为每个组的输出的第一行)?

下面是“data_df”数据的样本:

Ind Type Month Rev 
0 A Voice Dec-14 10.00 
1 A Voice Jan-15 8.00 
2 A Voice Feb-15 13.00 
3 A Voice Mar-15 9.00 
4 A Voice Apr-15 11.00 
5 A Voice May-15 14.00 
6 A Voice Jun-15 6.00 
7 A Voice Jul-15 4.00 
8 A Voice Aug-15 12.00 
9 A Voice Sep-15 7.00 
10 A Voice Oct-15 5.00 
11 A Elec Dec-14 8.04 
12 A Elec Jan-15 6.95 
13 A Elec Feb-15 7.58 
14 A Elec Mar-15 8.81 
15 A Elec Apr-15 8.33 
16 A Elec May-15 9.96 
17 A Elec Jun-15 7.24 
18 A Elec Jul-15 4.26 
19 A Elec Aug-15 10.84 
20 A Elec Sep-15 4.82 
21 A Elec Oct-15 5.68 
22 B Voice Dec-14 10.00 
23 B Voice Jan-15 8.00 
24 B Voice Feb-15 13.00 
25 B Voice Mar-15 9.00 
26 B Voice Apr-15 11.00 
27 B Voice May-15 14.00 
28 B Voice Jun-15 6.00 
29 B Voice Jul-15 4.00 
.. .. ...  ... ... 

输出(我是用升玩,但只希望排序阿尔法):

Type  Elec Voice 
Ind Month    
A Sep-15 4.82  7 
    Oct-15 5.68  5 
    May-15 9.96  14 
    Mar-15 8.81  9 
    Jun-15 7.24  6 
    Jul-15 4.26  4 
    Jan-15 6.95  8 
    Feb-15 7.58  13 
    Dec-14 8.04  10 
    Aug-15 10.84  12 
    Apr-15 8.33  11 

我想输出按日期排序:

Type  Elec Voice 
Ind Month    
A Dec-14 8.04  10 
    Jan-15 6.95  8 
    Feb-15 7.58  13 
    ... 

回答

1

您需要在创建后,您的“月”列转换为datetime从CSV文件数据框:

df['Month'] = pd.to_datetime(df['Month'], format="%b-%y") 

因为目前这是一个字符串...

或者你可以以解析日期使用下面的技巧(date_parser), “read_csv” 期间:

from __future__ import print_function 

import pandas as pd 

dateparser = lambda x: pd.datetime.strptime(x, '%b-%y') 

df = pd.read_csv('data.csv', delimiter=r'\s+', parse_dates=['Month'], date_parser=dateparser) 

print(df.sort_values(['Month'])) 

PS我不什么是您首选的输出日期格式...

1

我觉得你可以先转换柱Monthto_datetime然后to_period

data_df['Month'] = pd.to_datetime(data_df['Month'], format='%b-%y').dt.to_period('M') 

    Ind Type Month Rev 
0 A Voice 2014-12 10.00 
1 A Voice 2015-01 8.00 
2 A Voice 2015-02 13.00 
3 A Voice 2015-03 9.00 
4 A Voice 2015-04 11.00 
5 A Voice 2015-05 14.00 
6 A Voice 2015-06 6.00 
7 A Voice 2015-07 4.00 
8 A Voice 2015-08 12.00 
9 A Voice 2015-09 7.00 
10 A Voice 2015-10 5.00 
11 A Elec 2014-12 8.04 
12 A Elec 2015-01 6.95 
13 A Elec 2015-02 7.58 
14 A Elec 2015-03 8.81 
15 A Elec 2015-04 8.33 
16 A Elec 2015-05 9.96 
17 A Elec 2015-06 7.24 
18 A Elec 2015-07 4.26 
19 A Elec 2015-08 10.84 
20 A Elec 2015-09 4.82 
21 A Elec 2015-10 5.68 
22 B Voice 2014-12 10.00 
23 B Voice 2015-01 8.00 
24 B Voice 2015-02 13.00 
25 B Voice 2015-03 9.00 
26 B Voice 2015-04 11.00 
27 B Voice 2015-05 14.00 
28 B Voice 2015-06 6.00 
29 B Voice 2015-07 4.00 

然后用pivot_table,排序是没有必要的:

data_df = pd.pivot_table(data_df, values='Rev', index=['Ind', 'Month'], columns='Type', aggfunc=sum) 
print data_df 
Type   Elec Voice 
Ind Month     
A 2014-12 8.04  10 
    2015-01 6.95  8 
    2015-02 7.58  13 
    2015-03 8.81  9 
    2015-04 8.33  11 
    2015-05 9.96  14 
    2015-06 7.24  6 
    2015-07 4.26  4 
    2015-08 10.84  12 
    2015-09 4.82  7 
    2015-10 5.68  5 
B 2014-12 NaN  10 
    2015-01 NaN  8 
    2015-02 NaN  13 
    2015-03 NaN  9 
    2015-04 NaN  11 
    2015-05 NaN  14 
    2015-06 NaN  6 
    2015-07 NaN  4 

最后你可以通过strftime

new_index = zip(data_df.index.get_level_values('Ind'),data_df.index.get_level_values('Month').strftime('%b-%y')) 
data_df.index = pd.MultiIndex.from_tuples(new_index, names = data_df.index.names) 
print data_df 
Type   Elec Voice 
Ind Month    
A Dec-14 8.04  10 
    Jan-15 6.95  8 
    Feb-15 7.58  13 
    Mar-15 8.81  9 
    Apr-15 8.33  11 
    May-15 9.96  14 
    Jun-15 7.24  6 
    Jul-15 4.26  4 
    Aug-15 10.84  12 
    Sep-15 4.82  7 
    Oct-15 5.68  5 
B Dec-14 NaN  10 
    Jan-15 NaN  8 
    Feb-15 NaN  13 
    Mar-15 NaN  9 
    Apr-15 NaN  11 
    May-15 NaN  14 
    Jun-15 NaN  6 
    Jul-15 NaN  4 

Multiindex改变Datetimeindex或者你可以使用reset_indexdt.strftimeset_index

data_df = data_df.reset_index(level=1) 
data_df['Month'] = data_df['Month'].dt.strftime('%b-%y') 
data_df = data_df.set_index('Month', append=True) 
print data_df 
Type   Elec Voice 
Ind Month    
A Dec-14 8.04  10 
    Jan-15 6.95  8 
    Feb-15 7.58  13 
    Mar-15 8.81  9 
    Apr-15 8.33  11 
    May-15 9.96  14 
    Jun-15 7.24  6 
    Jul-15 4.26  4 
    Aug-15 10.84  12 
    Sep-15 4.82  7 
    Oct-15 5.68  5 
B Dec-14 NaN  10 
    Jan-15 NaN  8 
    Feb-15 NaN  13 
    Mar-15 NaN  9 
    Apr-15 NaN  11 
    May-15 NaN  14 
    Jun-15 NaN  6 
    Jul-15 NaN  4 
+0

精彩和感谢!我不得不更新我的代码中的sort_index,它完美地工作。 (table = table.sort_index(ascending = [0,1])) –

0

首先使用@jezrael的解决方案重新格式化Month列,你再这样做是为了让您的数据透视表:通过使用groupbyunstack

>>> df_data.pivot_table(values='Rev', index=['Ind', 'Month'], columns='Type') 
Type   Elec Voice 
Ind Month     
A 2014-12 8.04  10 
    2015-01 6.95  8 
    2015-02 7.58  13 
    2015-03 8.81  9 
    2015-04 8.33  11 
    2015-05 9.96  14 
    2015-06 7.24  6 
    2015-07 4.26  4 
    2015-08 10.84  12 
    2015-09 4.82  7 
    2015-10 5.68  5 
B 2014-12 NaN  10 
    2015-01 NaN  8 
    2015-02 NaN  13 
    2015-03 NaN  9 
    2015-04 NaN  11 
    2015-05 NaN  14 
    2015-06 NaN  6 
    2015-07 NaN  4 

或者:

df.groupby(['Ind', 'Month', 'Type']).Rev.sum().unstack('Type')