根据秩条件创建Groupby列Python

我正在使用python中的事件数据库，并且我需要编写一个函数来量化特定事件是否跟随（AT ANY POINT）另一个特定事件。根据秩条件创建Groupby列Python

df = pd.DataFrame({'User':[1,1,1,2,2,2], 
       'Product':['A','A','A','B','B','B'], 
       'Updated_At':['2015-01-01', 
          '2015-02-01', 
          '2015-03-01', 
          '2015-04-01', 
          '2015-05-01', 
          '2015-06-01'], 
        'Event':[1,1,2,1,3,2]})

对于用户具有的每个产品，做事件2在任何点后续事件1事件1，如果是下一个出现之前，继续排在那里事件= 1

答案（'Event_Updated “包含的行我想继续）：

df = pd.DataFrame({'User':[1,1,1,2,2,2], 
       'Product':['A','A','A','B','B','B'], 
       'Updated_At':['2015-01-01', 
          '2015-02-01', 
          '2015-03-01', 
          '2015-04-01', 
          '2015-05-01', 
          '2015-06-01'], 
       'Event':[1,1,2,1,3,2], 
       'Updated_Event':['no', 'yes', 'no', 'yes', 'no', 'no']})

合乎逻辑的步骤似乎是使用GROUPBY（保持[”用户”，‘产品’]），并创建一个虚拟列添加到GROUPBY，然后检查在User，Product，EventType1的每个实例中是否还存在Event = 2的行。类似于'Event_D ummy”栏下方：

df = pd.DataFrame({'User':[1,1,1,2,2,2], 
       'Product':['A','A','A','B','B','B'], 
       'Updated_At':['2015-01-01', 
          '2015-02-01', 
          '2015-03-01', 
          '2015-04-01', 
          '2015-05-01', 
          '2015-06-01'], 
       'Event':[1,1,2,1,3,2], 
       'Event_Dummy': [1,2,2,3,3,3], 
       'Updated_Event':['no', 'yes', 'no', 'yes', 'no', 'no']})

那么该语句将沿着线服用点：

检查，如果df.grouby('User','Product','Event_Dummy')包含2。

请让我知道我可以帮助澄清问题。

来源

2015-12-08 user3892921

我想我不明白。你想创建列'updated_Event'吗？或者是其他东西？我不明白'updated_Event'列中的第二个'是'。首先'是'是因为它是第二次发生，或者不是？也许[this]（http://stackoverflow.com/help/mcve）有帮助。 – jezrael

我对此感到抱歉。是的，我想创建'Updated_Event'列。如果“事件”= 1，则updated_event应该只计算为true，并且该事件在某个点由“事件”= 2（在另一个事件= 1之前）后跟。第一个“是”是因为事件之后是事件2.第二个“是”是因为事件之后是事件2（即使事件不是在事件= 1之后） – user3892921

我添加新列Updated_Event_new为更好地与Updated_Event列比较：

print df     
    Event Product Updated_At Updated_Event User 
0  1  A 2015-01-01   no  1 
1  1  A 2015-02-01   yes  1 
2  2  A 2015-03-01   no  1 
3  1  B 2015-04-01   yes  2 
4  3  B 2015-05-01   no  2 
5  2  B 2015-06-01   no  2

#subset all rows with 1 or 2 in column Event 
df1 = df[(df['Event'] == 1) | (df['Event'] == 2)] 
print df1 
    Event Product Updated_At Updated_Event User 
0  1  A 2015-01-01   no  1 
1  1  A 2015-02-01   yes  1 
2  2  A 2015-03-01   no  1 
3  1  B 2015-04-01   yes  2 
5  2  B 2015-06-01   no  2

#select columns Event with 1, where previous rows is 2 and 
#create new column Updated_Event_new with value yes 
df1.loc[((df1['Event'] == 1) & (df1['Event'].shift(-1) == 2)) , 'Updated_Event_new'] = 'yes' 
print df1 
    Event Product Updated_At Updated_Event User Updated_Event_new 
0  1  A 2015-01-01   no  1    NaN 
1  1  A 2015-02-01   yes  1    yes 
2  2  A 2015-03-01   no  1    NaN 
3  1  B 2015-04-01   yes  2    yes 
5  2  B 2015-06-01   no  2    NaN

#subset not all rows with 1 or 2 in column Event 
df2 = df[~((df['Event'] == 1) | (df['Event'] == 2))] 
print df2 
    Event Product Updated_At Updated_Event User 
4  3  B 2015-05-01   no  2

#concat both subset - df1 and df2 to original df 
df = pd.concat([df1,df2]) 

#sort index 
df = df.sort_index() 

#fill NaN in Updated_Event_new by value no 
df['Updated_Event_new'] = df['Updated_Event_new'].fillna('no') 
print df 
    Event Product Updated_At Updated_Event Updated_Event_new User 
0  1  A 2015-01-01   no    no  1 
1  1  A 2015-02-01   yes    yes  1 
2  2  A 2015-03-01   no    no  1 
3  1  B 2015-04-01   yes    yes  2 
4  3  B 2015-05-01   no    no  2 
5  2  B 2015-06-01   no    no  2

来源

2015-12-08 22:20:15 jezrael

根据秩条件创建Groupby列Python

回答

相关问题