搜索在熊猫数据帧

我必须为空两个dataframes：搜索在熊猫数据帧

import pandas as pd 
raw_data = { 
     'employee_id': ['4444', '5555', '6666','7777','8888'], 
     'first_name': ['aa', 'Jason', 'Tina', 'Jake', 'Amy'], 
     'last_name': ['Miller', 'Millers', 'Ali', 'Milner', 'Cooze'], 
     'age': [42, 42, 36, 24, 73], 
} 
df1 = pd.DataFrame(raw_data, columns = ['employee_id','first_name', 'last_name', 'age']) 


raw_data1 = {'employee_id': ['4444', '5555', '6666','7777'], 
    'ip': ['192.168.1.101', '192.168.1.102','192.168.1.103','192.168.1.104'], 

} 

df2 = pd.DataFrame(raw_data1, columns = ['employee_id', 'ip'])

我要搜索（比较）在DF1 df2['employee_id']，如果值是相同的，加df2['ip']为DF1：

print df2['ip'].where(df2['employee_id']==df1['employee_id'])

但这不是正确的方法：

ValueError: Can only compare identically-labeled Series objects

对此问题的任何建议w不胜感激。

来源

2017-08-08 jojo

这是一个更新的答案被删除的人用了我认为是更好的解决方案后。

on = "employee_id" 
df3 = df1.set_index(on).join(df2.set_index(on)).fillna("IP missing") 
df3["ip"].to_dict() 

employee_id first_name last_name age ip   
4444  aa   Miller  42 192.168.1.101 
5555  Jason  Millers  42 192.168.1.102 
6666  Tina  Ali   36 192.168.1.103 
7777  Jake  Milner  24 192.168.1.104 
8888  Amy   Cooze  73 IP missing 

{'4444': '192.168.1.101', 
'5555': '192.168.1.102', 
'6666': '192.168.1.103', 
'7777': '192.168.1.104', 
'8888': 'IP missing'}

以前的答案：

pd.merge(df1,df2,on="employee_id")

https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging

pd.merge（左，右，如何= '内部'，上=无，left_on =无， right_on =无， left_index = False，right_index = False，sort = True， suffixes =（'_ x'，'_y'），copy = True，indicator = False）

给人

employee_id first_name last_name age ip 
0 4444 aa  Miller 42 192.168.1.101 
1 5555 Jason Millers 42 192.168.1.102 
2 6666 Tina Ali 36 192.168.1.103 
3 7777 Jake Milner 24 192.168.1.104

，可能你想是这样的：

pd.merge(df1,df2,on="employee_id").set_index("employee_id")["ip"].to_dict() 

{'4444': '192.168.1.101', 
'5555': '192.168.1.102', 
'6666': '192.168.1.103', 
'7777': '192.168.1.104'}

来源

2017-08-08 20:33:13

如果我想向df1添加匹配值，那么怎么办：添加一列并插入匹配的ip，不匹配将是空的。谢谢 – jojo

@jojo在此重新分配它：on =“employee_id”，df1 = df1.set_index（on）.join（df2.set_index（on））。reset_index（） –

您的数据科学知识非常棒。你能否建议一些书籍或视频教程？我是Python开发人员，但在数据科学领域是全新的。谢谢 – jojo

使用merge

In [1286]: df1.merge(df2, on='employee_id') 
Out[1286]: 
    employee_id first_name last_name age    ip 
0  4444   aa Miller 42 192.168.1.101 
1  5555  Jason Millers 42 192.168.1.102 
2  6666  Tina  Ali 36 192.168.1.103 
3  7777  Jake Milner 24 192.168.1.104

来源

2017-08-08 20:31:55 Zero

我想了一段时间，你是如何复制到获取输出，所以很好地格式化？ –

我主要是Jupyter，你知道那里的诡计吗？步骤是什么？无论如何，你得到我的upvote +1。 –

@AntonvBR，它是'iPython' - 它自动为我们做... – MaxU

搜索在熊猫数据帧

回答

相关问题