2017-06-22 72 views
2

我需要根据参考字典重命名和重复我的数据帧列。下面我创建了一个虚拟数据帧:基于参考字典的熊猫重复数据帧列

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']} 
df= pd.DataFrame(rawdata) 
df.set_index('id') 

     entity entity2 entity3 
id        
json present present absent 
molly absent present absent 
tina absent present absent 
jake present absent present 
molly present absent absent 

现在我有下面的例子字典:

ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']} 

我需要基于字典的值到现在取代列名,如果列有一个以上值应比列重复。以下是我所希望的数据框:

 entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
id      
json present  present  present  absent  absent absent 
molly absent  present  present  absent  absent absent 
tina absent  present  present  absent  absent absent 
jake present  absent  absent  present  present present 
molly present  absent  absent  absent  absent absent 
+0

谢谢你以外我的swer。随意投票的答案。 – piRSquared

+0

谢谢piRSquared。你总是有最惊人的解决方案。 – Rtut

回答

1

选项1
在字典解析

pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1) 

     entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1 
id                     
json  present  present  absent  absent  absent  present 
molly  present  present  absent  absent  absent  absent 
tina  present  present  absent  absent  absent  absent 
jake  absent  absent  present  present  present  present 
molly  absent  absent  absent  absent  absent  present 

选项2
切片数据框使用pd.concat和重命名列

repeats = df.columns.map(lambda x: len(ref_dict[x])) 
d1 = df.reindex_axis(df.columns.repeat(repeats), 1) 
d1.columns = df.columns.map(ref_dict.get).values.sum() 
d1 

     entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
id                     
json  present  present  present  absent  absent  absent 
molly  absent  present  present  absent  absent  absent 
tina  absent  present  present  absent  absent  absent 
jake  present  absent  absent  present  present  present 
molly  present  absent  absent  absent  absent  absent 
0

对于df每一列,你可以寻找新的列数ref_dict创造new column为他们最后,删除旧的。您可以尝试以下操作:

# for key, value in ref_dict where old column and new columns are 
for old_column,new_columns in ref_dict.items(): 
    for new_column in new_columns: # for each new_column in new_columns defined 
     df[new_column] = df[old_column] # the content remains same as old column 
    del df[old_column] # now remove the old column 
0

你可以简单地循环:

rawdata= {'id':['json','molly','tina','jake','molly'], 
      'entity':['present','absent','absent','present','present'], 
      'entity2':['present','present','present','absent','absent'], 
      'entity3':['absent','absent','absent','present','absent']} 
df= pd.DataFrame(rawdata) 
df.set_index('id') 
ref_dict= {'entity':['entity_exp1'], 
      'entity2':['entity2_exp1','entity2_exp2'], 
      'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']} 

# here comes the new part: 
df2 = pd.DataFrame() 
for key, val in sorted(ref_dict.items()): 
    for subval in val: 
     df2[subval] = df[key] 

df2['id'] = df['id'] 
df2.set_index('id', inplace=True) 

print(df2) 
     entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
id                  
json  present  present  present  absent  absent  absent 
molly  absent  present  present  absent  absent  absent 
tina  absent  present  present  absent  absent  absent 
jake  present  absent  absent  present  present  present  
molly  present  absent  absent  absent  absent  absent 
0

您可以使用dict键列名重新索引你的DF,然后重命名使用dict的值的列。

df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[])) 
df_new.columns=sum(ref_dict.values(),[]) 
df_new 
Out[573]: 
    entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
0  present  present  present  absent  absent  absent 
1  absent  present  present  absent  absent  absent 
2  absent  present  present  absent  absent  absent 
3  present  absent  absent  present  present  present 
4  present  absent  absent  absent  absent  absent