根据条件合并3个不同的数据帧

前两个的主要关系必须基于ID1，因为它是两个数据框之间的匹配关系。

第三个数据帧，地址2必须以哈希添加

DF1匹配：

Name1 Name2 Name3 Address ID1  ID2 Own 
Matt John1 Jill  878 home 1  0  Deal 
Matt John2 Jack  879 home 2  1  Dael

DF2：

Name1 ID1 Address Name4  Address2 
Matt 1  878 home face1  face\123 
Matt 1  878 home face2  face\345 
Matt 1  878 home face3  face\678  
Matt 2  879 home head1  head\123 
Matt 2  879 home head2  head\345 
Matt 2  879 home head3  head\678

DF3：

Address2  Hash 
face\123  abc123 
face\345  cde321 
face\678  efg123 
head\123  123efg 
head\345  efg321 
head\678  acd321

我正试图结合3个dataframes成一个象下面这样：

Name1 Name2 ID1 Address  Own Name3 ID2 Name4 Address2 Hash 
Matt John1 1 878 home Deal Jill 0 face1 face\123 abc123 
Matt John1 1 878 home Deal Jill 0 face2 face\345 cde321 
Matt John1 1 878 home Deal Jill 0 face3 face\678 efg123 
Matt John2 2 879 home Dael Jack 1 head1 head\123 123efg 
Matt John2 2 879 home Dael Jack 1 head2 head\345 efg321 
Matt John2 2 879 home Dael Jack 1 head3 head\678 acd321

DF1之间和DF2关键是到ID1 DF2之间和DF3关键是地址2

非常感谢您的帮助。

来源

2017-03-09 johnnyb

你不就是在这里合并列交叉吗？ 'df1.merge（DF2）.merge（DF3）'？ – miradulo

看看merge函数，可以找到一些例子here。针对您的特定问题，请尝试以下操作：

combined_df = df1.merge(df2, on="Id1", how="inner").merge(df3, on="Adress2", how="inner")

来源

2017-03-09 17:38:36 StefP

我认为这会起作用。 Ther合并函数几乎适合您想要加入的列。

import numpy as np 
import pandas as pd 

data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'], 
       ['Matt','John1','Jill','878 home','1','0','Deal'], 
       ['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']]) 

data2 = np.array([['Name1','ID1','Address','Name4','Address2'], 
       ['Matt', '1','878 home','face1',"face.123"], 
       ['Matt', '1','878 home', 'face2','face.345'], 
        ['Matt', '1','878 home', 'face3', 'face.678'], 
        ['Matt', '2', '879 home', 'head1', 'head.123'], 
        ['Matt', '2', '879 home', 'head2', 'head.345'], 
        ['Matt', '2', '879 home', 'head3', 'head.678']]) 
#print(data) 
data3 = np.array([['Address2','Hash'], 
       ['face.123', 'abc123'], 
       ['face.345','cde321'], 
       ['face.678', 'efg123'], 
       ['head.123', '123efg'], 
       ['head.345', 'efg321'], 
       ['head.678', 'acd321']]) 

df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:]) 
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:]) 
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:]) 


Cdf= pd.merge(df1,df2, on='ID1', how='inner') 
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner') 
print(Ddf)

来源

2017-03-09 17:56:55 Cesar

从你期望的输出，你似乎并不需要任何规范超出列交叉融合是自动进行的。

>>> df1.merge(df2).merge(df3) 

    Name1 Name2 Name3 Address ID1 ID2 Own Name4 Address2 Hash 
0 Matt John1 Jill 878 home 1 0 Deal face1 face\123 abc123 
1 Matt John1 Jill 878 home 1 0 Deal face2 face\345 cde321 
2 Matt John1 Jill 878 home 1 0 Deal face3 face\678 efg123 
3 Matt John2 Jack 879 home 2 1 Dael head1 head\123 123efg 
4 Matt John2 Jack 879 home 2 1 Dael head2 head\345 efg321 
5 Matt John2 Jack 879 home 2 1 Dael head3 head\678 acd321

指定单数列作为接受的答案进行合并确实会导致问题，因为您将有后缀列。

>>> df1.merge(df2, on="ID1", how="inner").merge(df3, on="Address2", how="inner") 

    Name1_x Name2 Name3 Address_x ID1 ID2 Own Name1_y Address_y Name4 \ 
0 Matt John1 Jill 878home 1 0 Deal Matt 878home face1 
1 Matt John1 Jill 878home 1 0 Deal Matt 878home face2 
2 Matt John1 Jill 878home 1 0 Deal Matt 878home face3 
3 Matt John2 Jack 879home 2 1 Dael Matt 879home head1 
4 Matt John2 Jack 879home 2 1 Dael Matt 879home head2 
5 Matt John2 Jack 879home 2 1 Dael Matt 879home head3 

    Address2 Hash 
0 face\123 abc123 
1 face\345 cde321 
2 face\678 efg123 
3 head\123 123efg 
4 head\345 efg321 
5 head\678 acd321

来源

2017-03-09 18:12:23 miradulo

根据条件合并3个不同的数据帧

回答

相关问题