2014-01-07 44 views
3

我有三个熊猫dataframes,df1df2,df3,如下:Python的大熊猫:加入独特的列值并连接

import pandas as pd 
import numpy as np 
df1 = pd.DataFrame({'id' : ['one', 'two', 'three'], 'score': [56, 45, 78]}) 
df2 = pd.DataFrame({'id' : ['one', 'five', 'four'], 'score': [35, 81, 90]}) 
df3 = pd.DataFrame({'id' : ['five', 'two', 'six'], 'score': [23, 66, 42]}) 

我如何加入基于id这些dataframes然后拼接他们列在一起?所需的输出如下:

#join_and_concatenate by id: 

id score(df1) score(df2) score(df3) 
one 56   35   NaN 
two 45   NaN  66 
three 78   NaN  NaN 
four NaN   90   NaN 
five NaN   81   23 
six NaN   NaN  42 

我发现了一个有关page谈到有关merge()concatenate()join(),但我不知道任何一个给我想要的东西。

回答

4

可能有concat一个更好的办法,但这应该工作:

In [48]: pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id') 
Out[48]: 
     id score_x score_y score 
0 one  56  35 NaN 
1 two  45  NaN  66 
2 three  78  NaN NaN 
3 five  NaN  81  23 
4 four  NaN  90 NaN 
5 six  NaN  NaN  42 

[6 rows x 4 columns] 

为了得到你想要的答案:

In [54]: merged = pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id') 

In [55]: merged.set_index('id').rename(columns={'score_x': 'score(df1)', 'score_y': 'score(df2) 
', 'score': 'score(df3)'}) 
Out[55]: 
     score(df1) score(df2) score(df3) 
id          
one   56   35   NaN 
two   45   NaN   66 
three   78   NaN   NaN 
five   NaN   81   23 
four   NaN   90   NaN 
six   NaN   NaN   42 

[6 rows x 3 columns]