2016-10-27 49 views
2

我有一个xlsx文件,看起来像这样;使用熊猫提取我需要的数据

Name  01/09/16  02/09/16   03/09/16  
Jack  In   Out     In   
Lisa  Out   In     Out    
Tom   Out   In     In 

我试图打印这样的数据在表中使用熊猫;

+----------------------------------+-------------+-------------+-------------+ 
|    Status    | 01/09/16 | 02/09/16 | 03/09/16 | 
+----------------------------------+-------------+-------------+-------------+ 
|    In     | Jack   Tom    Tom 
               | Lisa  | Jack  | 
+----------------------------------+-------------+-------------+-------------+ 
|    Out    | Lisa 
             Tom  | Jack  | Lisa  | 
+----------------------------------+-------------+-------------+-------------+ 

我正在努力寻找一种方法来做到这一点熊猫。我想问问是否有任何简单的方法迭代日期列,将它匹配到一行并获取单元格值?

例如,我们来看第一列01/09/16,如何使用熊猫向下找到该列并找到单元格值'In',并将其与行名称'Jack'匹配,然后将其添加到像这样的嵌套字典;

dictionary = {'01/09/16': {In: [Jack], Out: [Lisa, Tom] } } 

如果我能得到它这样,我可以使用类似PrettyTable喜欢它表示在第二个表上方将其安排在一个表中。

回答

3

考虑在所有系列数据帧列上运行的字典理解。但首先,确保你的名称数据框的索引:

from io import StringIO 
import pandas as pd 

data = ''' 
Name  01/09/16  02/09/16   03/09/16  
Jack  In   Out     In   
Lisa  Out   In     Out    
Tom   Out   In     In 
''' 
df = pd.read_table(StringIO(data), sep="\s+", index_col=0) 
print(df) 

#  01/09/16 02/09/16 03/09/16 
# Name       
# Jack  In  Out  In 
# Lisa  Out  In  Out 
# Tom  Out  In  In 

# BUILD DICTIONARY 
dfdict = {col: (df[col][df[col] == 'In'].index.values, 
       df[col][df[col] == 'Out'].index.values) for col in df.columns} 
dfdict['Status'] = ['In', 'Out'] 

# CAST TO DATAFRAME 
finaldf = pd.DataFrame(dfdict) 
finaldf = finaldf[['Status'] + [col for col in df.columns]]    # RE-ORDER COLS 
print(finaldf) 

# Status  01/09/16  02/09/16  03/09/16 
# 0  In  [Jack] [Lisa, Tom] [Jack, Tom] 
# 1 Out [Lisa, Tom]  [Jack]  [Lisa] 
2

IIUC

pd.melt(
    df, id_vars=['Name'], value_vars=df.columns[1:].tolist(), 
    value_name='Status', var_name='Date' 
).set_index(['Status', 'Date']).groupby(level=[0, 1]).Name.apply(list).unstack() 

enter image description here

或者用更少的代码

df.set_index('Name').unstack().reset_index().groupby(['level_0', 0]) \ 
    .Name.apply(list).rename_axis([None, None]).unstack(0) 

enter image description here

+1

你快了;) – jezrael

相关问题