使用MultiIndex导出熊猫数据框

我刚刚发现了熊猫，并对其功能印象深刻。我很难理解如何使用MultiIndex处理DataFrame。使用MultiIndex导出熊猫数据框

我有两个问题：

（1）出口数据帧

这里我的问题：此数据集

import pandas as pd 
import StringIO 
d1 = StringIO.StringIO(
    """Gender,Employed,Region,Degree 
    m,yes,east,ba 
    m,yes,north,ba 
    f,yes,south,ba 
    f,no,east,ba 
    f,no,east,bsc 
    m,no,north,bsc 
    m,yes,south,ma 
    f,yes,west,phd 
    m,no,west,phd 
    m,yes,west,phd """ 
    ) 

df = pd.read_csv(d1) 

# Frequencies tables 
tab1 = pd.crosstab(df.Gender, df.Region) 
tab2 = pd.crosstab(df.Gender, [df.Region, df.Degree]) 
tab3 = pd.crosstab([df.Gender, df.Employed], [df.Region, df.Degree]) 

# Now we export the datasets 
tab1.to_excel('H:/test_tab1.xlsx') # OK 
tab2.to_excel('H:/test_tab2.xlsx') # fails 
tab3.to_excel('H:/test_tab3.xlsx') # fails

一个变通我能想到的是改变列（方式R）

def NewColums(DFwithMultiIndex): 
     NewCol = [] 
     for item in DFwithMultiIndex.columns: 
       NewCol.append('-'.join(item)) 
     return NewCol 

# New Columns 
tab2.columns = NewColums(tab2) 
tab3.columns = NewColums(tab3) 

# New export 
tab2.to_excel('H:/test_tab2.xlsx') # OK 
tab3.to_excel('H:/test_tab3.xlsx') # OK

我的问题是：有没有更有效的方法来做到这一点熊猫，我错过了文档？

2）选择列

这种新的结构不允许选择在一个给定的变量（分层索引的排在首位的优势）colums。如何选择包含给定字符串的列（例如'-ba'）？

PS：我看到this question这是相关但不明白的答复提出

来源

2013-01-15 user1043144

有趣的是'TAB2。 T.to_excel'工作，所以它只是列的MultIndex这是一个问题。 –

@hayden：感谢您更新链接。该功能确实方便显示。 – user1043144

这看起来像在to_excel一个bug，暂时作为一种解决方法，我会建议使用to_csv（这似乎不显示这个问题）。

我将此添加为an issue on github。

要回答第二个问题，如果你真的需要使用to_excel ...

您可以使用filter只选择那些列，其中包括'-ba'：

In [21]: filter(lambda x: '-ba' in x, tab2.columns) 
Out[21]: ['east-ba', 'north-ba', 'south-ba'] 

In [22]: tab2[filter(lambda x: '-ba' in x, tab2.columns)] 
Out[22]: 
     east-ba north-ba south-ba 
Gender        
    f  1   0   1 
    m  1   1   0

来源

2013-01-15 19:29:30

谢谢。也知道我没有在文档中监督过某些东西。 – user1043144

使用MultiIndex导出熊猫数据框

回答

相关问题