2016-05-17 54 views
1

我想创建一个自定义排序的DataFrame。要做到这一点,我已经使用pandas.Categorical()然而,如果我然后在一个组中使用这个结果NAN返回值。为什么熊猫不允许在groupby中使用分类列?

# import the pandas module 
import pandas as pd 

# Create an example dataframe 
raw_data = {'Date': ['2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13','2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13'], 
     'Portfolio': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C'], 
     'Duration': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], 
     'Yield': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1],} 

df = pd.DataFrame(raw_data, columns = ['Date', 'Portfolio', 'Duration', 'Yield']) 

df['Portfolio'] = pd.Categorical(df['Portfolio'],['C', 'B', 'A']) 
df=df.sort_values('Portfolio') 

dfs = df.groupby(['Date','Portfolio'], as_index =False).sum() 

print(dfs) 

         Date Portfolio Duration Yield 
Date  Portfolio    
13/05/2016 C   NaN  NaN   NaN  NaN 
      B   NaN  NaN   NaN  NaN 
      A   NaN  NaN   NaN  NaN 

这是为什么,我该如何克服这个问题?

另外SettingWithCopyWarning是否有更好的Categorical成语?

+2

这似乎涉及与其他“日期”列组合中的错误/使用'as_index = FALSE'(均只有通过投资组合,或与不使用as_index分组=假不工作)。您想在https://github.com/pydata/pandas/issues报告问题吗? – joris

回答

1

as_index=False正在搞点东西了。如果我只是运行:

dfs = df.groupby(['Date','Portfolio']).sum() 

我得到:

     Duration Yield 
Date  Portfolio     
2016-05-13 C    18 6.0 
      B    10 10.0 
      A     6 1.8 

我不知道这是为什么。这可能是一个错误。

如果你真的想要没有索引的结果,只需'Date''Portfolio'作为列,然后使用'reset_index()'

dfs = df.groupby(['Date','Portfolio']).sum().reset_index() 

     Date Portfolio Duration Yield 
0 2016-05-13   C  18 6.0 
1 2016-05-13   B  10 10.0 
2 2016-05-13   A   6 1.8