2016-04-09 110 views
0

我有3个熊猫数据框(类似于下图)。我有2所列出list ID_1 = ['sdf', 'sdfsdf', ...]list ID_2 = ['kjdf', 'kldfjs', ...]如何从不同的熊猫数据框中选取多列

Table1: 
    ID_1 ID_2 Value 
0 PUFPaY9 NdYWqAJ 0.002 
1 Iu6AxdB qANhGcw 0.01 
2 auESFwW jUEUNdw 0.2345 
3 LWbYpca G3uZ_Rg 0.0835 
4 8fApIAM mVHrayg 0.0295 

Table2: 
    ID_1 weight1 weight2 .....weightN 
0 PUFPaY9  
1 Iu6AxdB  
2 auESFwW 
3 LWbYpca  

Table3: 
    ID_2 weight1 weight2 .....weightN 
0 PUFPaY9  
1 Iu6AxdB  
2 auESFwW  
3 LWbYpca  

我想有应等来计算一个数据帧,

for each x ID_1 in list1: 
    for each y ID_2 in list2: 
     if x-y exist in Table1: 
      temp_row = (x[weights[i]].* y[weights[i]]) 
      # here i want one to one multiplication, x[weight1]*y[weight1] , x[weight2]*y[weight2] 
      temp_row.append(value[x-y] in Table1) 
      new_dataframe.append(temp_row) 

return new_dataframe 

所需new_dataframe应该像表4:

Table4: 
     weight1 weight2 weight3 .....weightN value 
    0   
    1   
    2  
    3  

我我现在能够做的是:

new_df = df[(df.ID_1.isin(list1)) & (df.ID_2.isin(list2))] 使用这个我得到所有有效的ID_1ID_2组合和值。但我不知道,我怎么能从两个数据库中获得权重的乘法(每个weight[i]没有循环)?

现在的任务是比较容易的,我可以遍历new_dffor each row in new_df,我会找到weight[i to n] for ID_1 from table 2weight[i to n] for ID_2 from table3。然后我可以将one-one multiplication"value" from table1附加到新的FINAL_DF。但我不想循环和做,我们可以用更聪明的方式解决这个问题吗?

+0

在问题已更新。我不确定我们是否有不使用循环的选项。 – impossible

+0

请检查我的答案 – MaxU

回答

0

是你想要的吗?

data = """\ 
ID_1 
PUFPaY9  
aaaaaaa 
Iu6AxdB  
auESFwW 
LWbYpca 
""" 
id1 = pd.read_csv(io.StringIO(data), delim_whitespace=True) 

data = """\ 
ID_2 
PUFPaY9 
Iu6AxdB 
xxxxxxx 
auESFwW 
LWbYpca 
""" 
id2 = pd.read_csv(io.StringIO(data), delim_whitespace=True) 

cols = ['weight{}'.format(i) for i in range(1,5)] 
for c in cols: 
    id1[c] = np.random.randint(1, 10, len(id1)) 
    id2[c] = np.random.randint(1, 10, len(id2)) 

id1.set_index('ID_1', inplace=True) 
id2.set_index('ID_2', inplace=True) 

df_mul = id1 * id2 

一步一步:

In [215]: id1 
Out[215]: 
     weight1 weight2 weight3 weight4 
ID_1 
PUFPaY9  8  9  1  1 
aaaaaaa  6  1  9  2 
Iu6AxdB  8  4  8  5 
auESFwW  9  3  4  2 
LWbYpca  7  7  1  8 

In [216]: id2 
Out[216]: 
     weight1 weight2 weight3 weight4 
ID_2 
PUFPaY9  6  5  5  1 
Iu6AxdB  1  5  4  5 
xxxxxxx  1  2  6  4 
auESFwW  3  9  5  5 
LWbYpca  3  3  6  7 

In [217]: id1 * id2 
Out[217]: 
     weight1 weight2 weight3 weight4 
Iu6AxdB  8.0  20.0  32.0  25.0 
LWbYpca  21.0  21.0  6.0  56.0 
PUFPaY9  48.0  45.0  5.0  1.0 
aaaaaaa  NaN  NaN  NaN  NaN 
auESFwW  27.0  27.0  20.0  10.0 
xxxxxxx  NaN  NaN  NaN  NaN