2017-10-17 119 views
1

有两个数据框,一个有少量信息(df1),另一个有全部数据(df2)。我正在尝试在df1的新列中创建该列,该列找到Total2值并根据名称填充新列。请注意,在df1中可见的名称将始终在df2的名称中找到匹配项。我想知道在熊猫中是否有一些功能已经做到了这一点?我的最终目标是创建一个条形图。使用其他数据框的匹配值在数据帧中创建新列

alldatapath = "all_data.csv" 
filteredpath = "filtered.csv" 

import pandas as pd 

df1 = pd.read_csv(
    filteredpath,  # file name 
    sep=',',     # column separator 
    quotechar='"',    # quoting character 
    na_values="NA",    # fill missing values with 0 
    usecols=[0,1],  # columns to use 
    decimal='.')    # symbol for decimals 

df2 = pd.read_csv(
    alldatapath,  # file name 
    sep=',',     # column separator 
    quotechar='"',    # quoting character 
    na_values="NA",    # fill missing values with 0 
    usecols=[0,1],  # columns to use 
    decimal='.')    # symbol for decimals 

df1 = df1.head(5) #trim to top 5 

print(df1) 
print(df2) 

输出(DF1):

  Name Total 
0 Accounting  3 
1 Reporting  1 
2  Finance  1 
3  Audit  1 
4 Template  2 

输出(DF2):

  Name Total2 
0 Reporting 100 
1 Accounting 120 
2  Finance 400 
3  Audit 500 
4 Information  50 
5  Template 1200 
6  KnowHow 2000 

最终输出(DF1)应该是这样的:

  Name Total Total2(new column) 
0 Accounting  3 120 
1 Reporting  1 100 
2  Finance  1 400 
3  Audit  1 500 
4 Template  2 1200 

回答

2

需要map通过Series第一个新列:

df1['Total2'] = df1['Name'].map(df2.set_index('Name')['Total2']) 
print (df1) 
     Name Total Total2 
0 Accounting  3  120 
1 Reporting  1  100 
2  Finance  1  400 
3  Audit  1  500 
4 Template  2 1200 

然后set_indexDataFrame.plot.bar

df1.set_index('Name').plot.bar() 
+0

的感谢!我将研究这些功能,将其应用于我的全球代码。 – Gonzalo

相关问题