2017-07-06 50 views
2

我无法有效地向MultiIndexed DataFrame添加单行。通过添加该行,MultiIndex被扁平化为一个简单的元组索引。奇怪的是,这对MultiIndexed列来说不是问题。如何添加一行到熊猫DataFrame而不展平MultiIndex

系统信息:

Python 3.6.1 |Continuum Analytics, Inc.| (default, Mar 22 2017, 19:25:17) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import pandas as pd 
>>> pd.__version__ 
'0.19.2' 

示例数据:既多指标行和列数据帧

import numpy as np 
import pandas as pd 

index = pd.MultiIndex(levels=[['bar', 'foo'], ['one', 'two']], 
         labels=[[0, 0, 1, 1], [0, 1, 0, 1]], 
         names=['row_0', 'row_1']) 
columns = pd.MultiIndex(levels=[['dull', 'shiny'], ['a', 'b']], 
         labels=[[0, 0, 1, 1], [0, 1, 0, 1]], 
         names=['col_0', 'col_1']) 
df = pd.DataFrame(np.ones((4,4)),columns=columns, index=index) 

print(df) 

    col_0  dull  shiny  
col_1   a b  a b 
row_0 row_1      
bar one 1.0 1.0 1.0 1.0 
     two 1.0 1.0 1.0 1.0 
foo one 1.0 1.0 1.0 1.0 
     two 1.0 1.0 1.0 1.0 

这是没有问题的一个附加列添加到数据帧:

df['last_col'] = 42 #define a new column and assign a value 

print(df) 

col_0  dull  shiny  last_col 
col_1   a b  a b   
row_0 row_1        
bar one 1.0 1.0 1.0 1.0  42 
     two 1.0 1.0 1.0 1.0  42 
foo one 1.0 1.0 1.0 1.0  42 
     two 1.0 1.0 1.0 1.0  42 

但是,如果我为添加一行(通过使用loc)做同样的事情,MultiIndex被平化为 简单的元组的指数:

df.loc['last_row'] = 43 #define a new row and assign a value 

print(df) 

col_0  dull  shiny  last_col 
col_1   a  b  a  b   
(bar, one) 1.0 1.0 1.0 1.0  42 
(bar, two) 1.0 1.0 1.0 1.0  42 
(foo, one) 1.0 1.0 1.0 1.0  42 
(foo, two) 1.0 1.0 1.0 1.0  42 
last_row 43.0 43.0 43.0 43.0  43 

有谁有一个想法如何添加行没有一个既简单又有效的方式压扁指数?非常感谢你!!

+0

开设了一个问题:https://github.com/pandas-dev/pandas/issues/17024 –

回答

2

我认为你需要元组定义的MultiIndex两个值:

df.loc[('last_row', 'a'), :] = 43 
print(df) 
col_0   dull  shiny  
col_1    a  b  a  b 
row_0 row_1       
bar  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
foo  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
last_row a  43.0 43.0 43.0 43.0 

对于列它的工作原理类似:

df[('last_col', 'a')] = 43 
print(df) 
col_0  dull  shiny  last_col 
col_1   a b  a b  a 
row_0 row_1        
bar one 1.0 1.0 1.0 1.0  43 
     two 1.0 1.0 1.0 1.0  43 
foo one 1.0 1.0 1.0 1.0  43 
     two 1.0 1.0 1.0 1.0  43 

编辑:

看来你需要定义的列名,如果需要全部使用:

df.loc['last_row',:] = 43 
print(df) 
col_0   dull  shiny  
col_1    a  b  a  b 
row_0 row_1       
bar  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
foo  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
last_row  43.0 43.0 43.0 43.0 

如果添加没有定义级别空字符串:

print(df.index) 
MultiIndex(levels=[['bar', 'foo', 'last_row'], ['one', 'two', '']], 
      labels=[[0, 0, 1, 1, 2], [0, 1, 0, 1, 2]], 
      names=['row_0', 'row_1']) 
df.loc['last_row','dull'] = 43 
print(df) 
col_0   dull  shiny  
col_1    a  b  a b 
row_0 row_1      
bar  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
foo  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
last_row  43.0 43.0 NaN NaN 
df.loc['last_row', ('dull', 'a')] = 43 
print(df) 
col_0   dull  shiny  
col_1    a b  a b 
row_0 row_1      
bar  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
foo  one  1.0 1.0 1.0 1.0 
     two  1.0 1.0 1.0 1.0 
last_row  43.0 NaN NaN NaN 
+0

嗨jezrael,很酷,看起来不错。非常感谢!! –