2017-10-17 89 views
2

我只是在公共列合并两个dataframes:在这种情况下缺少数据合并

df1 

        email       account 
0     [email protected]     555 
1     [email protected]      666 
2     [email protected]      Nan 
3     [email protected]      999 


df2 (i think ip is index here) 

ip    account 
1.1.1.1   555 
2.2.2.2   666 
. 
. 


df3= pd.merge(df1,df2,on='accountname') 

,我已经丢失的数据。我怎样才能避免这种情况?

+0

DF3 = pd.merge(df1.dropna(),df2.dropna(),开= '帐户名',如何=”内部“) –

+0

你没有accountname字段,你知道吗? –

+0

可以哟,请更新一个更好的数据样本。这两个输入和输出,使用[本文](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)作为如何添加dataframs到SO – DJK

回答

3
pd.merge(df1,df2,on='accountname',how='left') 

或者

pd.merge(df1,df2,on='accountname',how='inner') 

编辑: 让我们看看你的样本数据,则merge海峡与诠释。那为什么所有NaN

df1.applymap(type) 
Out[96]: 
      email  account 
0 <class 'str'> <class 'str'> 
1 <class 'str'> <class 'str'> 
2 <class 'str'> <class 'str'> 
3 <class 'str'> <class 'str'> 
df2.applymap(type) 
Out[97]: 
       account 
ip      
1.1.1.1 <class 'int'> 
2.2.2.2 <class 'int'> 

如何做到这一点:

选项1

变化strnumeric使用pd.to_numeric

df1.account=pd.to_numeric(df1.account,errors ='coerce') 
df1.applymap(type) 
Out[99]: 
      email   account 
0 <class 'str'> <class 'float'> 
1 <class 'str'> <class 'float'> 
2 <class 'str'> <class 'float'> 
3 <class 'str'> <class 'float'> 

df1.merge(df2.reset_index(),on=['account'],how='left') 


Out[101]: 
      email account  ip 
0 [email protected]  555 1.1.1.1 
1 [email protected]  666 2.2.2.2 
2 [email protected]  NaN  NaN 
3 [email protected]  999  NaN 

选项2

我们只是改变df2.accountstr(我更喜欢使用第一pd.to-numeric

df2.account=df2.account.astype(str) 
df1.merge(df2.reset_index(),on=['account'],how='left') 
Out[105]: 
      email account  ip 
0 [email protected]  555 1.1.1.1 
1 [email protected]  666 2.2.2.2 
2 [email protected]  Nan  NaN 
3 [email protected]  999  NaN 
+0

我仍然有缺失的数据'df1:电子邮件帐户'和'df2:帐户ip'这么简单,但我不知道为什么我从df2合并后缺少数据 – Jasmin

+0

@poço显示您的示例数据plz – Wen

+0

@poço检查我的更新,抱歉回复,刚完成我自己的工作,PS:如果它为你工作,你可以upvoted并接受:) – Wen