2015-10-09 116 views
1

我试图在熊猫中使用(不是真正的)新切片运算符,但有些东西我并没有得到。假设我生成以下分层数据框:熊猫:多重索引切片 - 混合切片和列表

#Generate container to hold component DFs 
df_list=[] 

#Generate names for third dimension positions 
third_names=['front','middle','back'] 

#For three positions in the third dimension... 
for lab in third_names: 
    #...generate the corresponding section of raw data... 
    d=DataFrame(np.random.uniform(size=20).reshape(4,5),columns='a b c d e'.split(' ')) 
    #...name the columns dimension... 
    d.columns.name='dim1' 
    #...generate second and third dims (to go in index)... 
    d['dim2']=['one','two','three','four'] 
    d['dim3']=lab 
    #...set index... 
    d.set_index(['dim3','dim2'],inplace=True) 
    #...and throw the DF in the container 
    df_list.append(d) 

#Concatenate component DFs together 
d3=pd.concat(df_list) 

d3_long=d3.stack().sortlevel(0) 

print d3_long 

产量:

dim3 dim2 dim1 
back four a  0.501184 
       b  0.627202 
       c  0.329643 
       d  0.484261 
       e  0.884803 
     one a  0.834231 
       b  0.918897 
       c  0.196537 
       d  0.242109 
       e  0.860124 
     three a  0.782651 
       b  0.998361 
       c  0.849685 
       d  0.210377 
       e  0.866776 
     two a  0.908422 
       b  0.737073 
       c  0.064402 
       d  0.240718 
       e  0.044409 
front four a  0.100877 
       b  0.963870 
       c  0.254075 
       d  0.126556 
       e  0.033631 
     one a  0.243552 
       b  0.999168 
       c  0.752251 
       d  0.684718 
       e  0.353013 
     three a  0.938928 
       b  0.112993 
       c  0.615178 
       d  0.430318 
       e  0.330437 
     two a  0.301921 
       b  0.645425 
       c  0.464172 
       d  0.824765 
       e  0.606823 
middle four a  0.814888 
       b  0.228860 
       c  0.333184 
       d  0.622176 
       e  0.151248 
     one a  0.547780 
       b  0.592404 
       c  0.684111 
       d  0.885605 
       e  0.601560 
     three a  0.340951 
       b  0.839149 
       c  0.800098 
       d  0.663753 
       e  0.215224 
     two a  0.138430 
       b  0.917627 
       c  0.342968 
       d  0.406744 
       e  0.822957 
dtype: float64 

我可以在第一两个维度与行为我希望得到...

print d3_long.loc[(slice('front','middle'),slice('two','four')),:] 

产量:

dim3 dim2 dim1 
front four a  0.100877 
       b  0.963870 
       c  0.254075 
       d  0.126556 
       e  0.033631 
     one a  0.243552 
       b  0.999168 
       c  0.752251 
       d  0.684718 
       e  0.353013 
     three a  0.938928 
       b  0.112993 
       c  0.615178 
       d  0.430318 
       e  0.330437 
     two a  0.301921 
       b  0.645425 
       c  0.464172 
       d  0.824765 
       e  0.606823 
middle four a  0.814888 
       b  0.228860 
       c  0.333184 
       d  0.622176 
       e  0.151248 
     one a  0.547780 
       b  0.592404 
       c  0.684111 
       d  0.885605 
       e  0.601560 
     three a  0.340951 
       b  0.839149 
       c  0.800098 
       d  0.663753 
       e  0.215224 
     two a  0.138430 
       b  0.917627 
       c  0.342968 
       d  0.406744 
       e  0.822957 
dtype: float64 

然而,以下调用产生完全相同的结果。

d3_long.loc[(slice('front','middle'),slice('two','four'),slice('b','d')),:] 

这就像它忽略了MultiIndex的第三级。当我尝试使用列表结构来获取特定位置时...

d3_long.loc[(slice('front','middle'),slice('two','four'),['b','d']),:] 

它产生TypeError。有什么想法吗?

回答

0

d3_long实际上是Series,所以你不需要在你的切片机的最后:。请注意,您的第二级slice('two','four')不会选择任何内容(它相当于[-1:1])。

但是,如果你扭转顺序,它应该给你所期望的。

In [82]: d3_long.loc[slice('front','middle'),slice('four','two'), ['b','d']] 
Out[82]: 
dim3 dim2 dim1 
front four b  0.301573 
       d  0.478005 
     one b  0.306292 
       d  0.281984 
     three b  0.108174 
       d  0.776523 
     two b  0.028694 
       d  0.527417 
middle four b  0.285103 
       d  0.647165 
     one b  0.807411 
       d  0.309446 
     three b  0.277752 
       d  0.939555 
     two b  0.470019 
       d  0.447640 
dtype: float64 
+0

我被这个错误挂了,我甚至没有注意到二级命令。这很有用,谢谢。 –