将2列中的值进行比较，并在熊猫中的第三列中输出结果

我的数据如下所示，其中我尝试使用给定值创建列输出。将2列中的值进行比较，并在熊猫中的第三列中输出结果

 a_id b_received c_consumed 
    0 sam  soap  oil 
    1 sam  oil  NaN 
    2 sam  brush  soap 
    3 harry  oil  shoes 
    4 harry  shoes  oil 
    5 alice  beer  eggs 
    6 alice  brush  brush 
    7 alice  eggs  NaN

生产数据集的代码是

df = pd.DataFrame({'a_id': 'sam sam sam harry harry alice alice alice'.split(), 
       'b_received': 'soap oil brush oil shoes beer brush eggs'.split(), 
       'c_consumed': 'oil NaN soap shoes oil eggs brush NaN'.split()})

我希望有一个新的列称为输出看起来像这样

 a_id b_received c_consumed output 
    0 sam  soap  oil 1 
    1 sam  oil  NaN 1 
    2 sam  brush  soap 0 
    3 harry  oil  shoes 1 
    4 harry  shoes  oil 1 
    5 alice  beer  eggs 0 
    6 alice  brush  brush 1 
    7 alice  eggs  NaN 1

所以搜索是如果SAM收到肥皂，油和刷，查找他消费的产品的“消耗”列中的值，所以如果肥皂被消耗，输出将是1，但由于没有消耗刷，输出为0.

同样哈利，他获得石油和鞋，然后寻找石油和鞋的消费列，如果石油消耗时，输出为1

为了使它更清晰，产值相当于第一列（收到），取决于第二列中的值（已消耗）。

我尝试使用此代码

a=[] 
    for i in range(len(df.b_received)): 
     if any(df.c_consumed == df.b_received[i]): 
       a.append(1) 
     else: 
       a.append(0) 

    df['output']=a

这给了我输出

 a_id b_received c_consumed output 
    0 sam  soap  oil  1 
    1 sam  oil  NaN  1 
    2 sam  brush  soap  1 
    3 harry  oil  shoes  1 
    4 harry  shoes  oil  1 
    5 alice  beer  eggs  0 
    6 alice  brush  brush  1 
    7 alice  eggs  NaN  1

的问题是，因为山姆没有消耗刷，输出应该是0，但输出为1 ，因为刷子被不同的人（爱丽丝）消耗掉了。我需要确保不会发生。产出需要针对每个人的消费情况。

我知道这是令人困惑的，所以如果我没有把自己弄的很清楚，请不要问，我会回答你的意见。

来源

2016-02-18 Amit Singh Parihar

你应该包括你到目前为止书面实现这一目标的代码。包含代码会很有帮助，例如某人可以复制并粘贴并创建数据框。 – imp9

另外，谁消耗这个物体有什么关系？ – imp9

好吧，我已经添加了代码来重现数据集，是的，它是重要的谁消费它在以后的操作，我想要计算每个用户消费未来收到的项目的可能性。如果情况并非如此，我只是使用'查找'功能 –

关键是pandas.Series.isin()它检查传递到pandas.Series.isin()的对象中调用pandas.Series中每个元素的成员资格。您想要检查b_received中每个元素的成员资格与c_consumed，但仅限于a_id定义的每个组内。当使用groupby和apply时，熊猫将通过分组变量以及其原始索引来索引对象。在你的情况下，你不需要索引中的分组变量，所以你可以使用drop=True将索引重新设置为原来的reset_index。

df['output'] = (df.groupby('a_id') 
       .apply(lambda x : x['b_received'].isin(x['c_consumed']).astype('i4')) 
       .reset_index(level='a_id', drop=True))

你DataFrame现在...

a_id b_received c_consumed output 
0 sam  soap  oil  1 
1 sam  oil  NaN  1 
2 sam  brush  soap  0 
3 harry  oil  shoes  1 
4 harry  shoes  oil  1 
5 alice  beer  eggs  0 
6 alice  brush  brush  1 
7 alice  eggs  NaN  1

看看一个文档为split-apply-combine与大熊猫有关更详尽的解释。

来源

2016-02-18 23:15:09 JaminSore

谢谢你，我会研究拆分应用组合方法，好像我将来会有更多的使用方法 –

这应该工作，但理想的方法将是一个由JaminSore

df['output'] = 0 

ctr = 0 

for names in df['a_id'].unique(): 
    for n, row in df.loc[df.a_id == names].iterrows(): 
     if row['b_received'] in df.loc[df.a_id == names]['c_consumed'].values: 
      df.ix[ctr:]['output']=1 
      ctr+=1 
     else: 
      df.ix[ctr:]['output']=0 
      ctr+=1

给出的数据帧现在正在

a_id b_received c_consumed output 
0 sam  soap  oil  1 
1 sam  oil  NaN  1 
2 sam  brush  soap  0 
3 harry  oil  shoes  1 
4 harry  shoes  oil  1 
5 alice  beer  eggs  0 
6 alice  brush  brush  1 
7 alice  eggs  NaN  1

来源

2016-02-18 23:35:02 septra

将2列中的值进行比较，并在熊猫中的第三列中输出结果

回答

相关问题