映射列到另一个来创建一个新的列

我有一个数据帧映射列到另一个来创建一个新的列

id store address 
1 100  xyz 
2 200  qwe 
3 300  asd 
4 400  zxc 
5 500  bnm

我还有一个数据帧DF2

serialNo store_code warehouse 
    1   300   Land 
    2   500   Sea 
    3   100   Land 
    4   200   Sea 
    5   400   Land

我希望我的最后数据帧的样子：

id store address warehouse 
1 100  xyz  Land 
2 200  qwe  Sea 
3 300  asd  Land 
4 400  zxc  Land 
5 500  bnm  Sea

即从一个数据帧到另一个数据帧的映射创建新列

来源

2017-09-05 Shubham

选项1

使用df.merge

out = df1.merge(df2, left_on='store', right_on='store_code')\ 
         [['id', 'store', 'address', 'warehouse']] 
print(out) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

选项2

使用pd.concat和df.sort_values

out = pd.concat([df1.sort_values('store'),\ 
     df2.sort_values('store_code')[['warehouse']].reset_index(drop=1)], 1) 
print(out) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

第一次排序通话冗余假设你的数据帧已经排序上store，在这种情况下，你可以将其删除。

选项3

使用df.replace

s = df1.store.replace(df2.set_index('store_code')['warehouse']) 
print(s) 
0 Land 
1  Sea 
2 Land 
3 Land 
4  Sea 

df1['warehouse'] = s 
print(df1) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

可替换地，显式地创建的映射。如果您稍后想使用它，这将起作用。

mapping = dict(df2[['store_code', 'warehouse']].values) # separate step 
df1['warehouse'] = df1.store.replace(mapping) # df1.store.map(mapping) 
print(df1) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

来源

2017-09-05 08:04:57

使用map或join：

df1['warehouse'] = df1['store'].map(df2.set_index('store_code')['warehouse']) 
print (df1) 
    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

df1 = df1.join(df2.set_index('store_code'), on=['store']).drop('serialNo', 1) 
print (df1) 
    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

来源

2017-09-05 07:55:39 jezrael

我在类似数据集中运行.map代码时出现此错误。 'Reindexing只对唯一有价值的索引对象有效' – Shubham

我认为在'df2'的'store_code'中有重复的问题。所以需要'df1 ['store']。map（df2.drop_duplicates（'store_code'）。set_index（'store_code'）['warehouse']）' – jezrael

正确！谢谢：） – Shubham

映射列到另一个来创建一个新的列

回答

相关问题