2017-09-22 47 views
-1

我想将我的SQL代码转换成一个Python(熊猫)过滤器函数,但它给我一个很难。任何想法如何根据SQL条件过滤我的数据而不循环记录? Desc ='Bla1'的差异。如何根据Python中的SQL条件过滤pandas DataFrame?

if joe_doe:保留记录Hello = 1;其他:保留记录与 Hello = 0

SQL

Hello = 
     CASE 
      WHEN 
      (
       Desc = 'Bla1' 
       AND Value = 'True' 
      ) 
      OR 
      (
       Desc IN('Bla2', 'Bla3') 
       AND Active = 'True'     
      ) 
      AND Enabled = 'True' 
      THEN 1 
      ELSE 0 

的Python(包括大熊猫)

def get_it(john_doe, df): 

    sentences = { 
      'Bla1': 'Value', 
      'Bla2': 'Active', 
      'Bla3': 'Active' 
     } 

    if john_doe: 
     df = df[HOW TO KEEP ALL RECORDS THAT HAVE Hello = 1?] 
    else: 
     df = df[HOW TO KEEP ALL RECORDS THAT HAVE Hello = 0?] 
    return df 

数据框中输入

id | Desc | Active | Enabled | Value | [A LOT OF OTHER COLUMNS] 
1 | Bla2 | 1  | 0  | 1  | [A LOT OF OTHER COLUMNS] 
2 | Bla3 | 1  | 1  | 1  | [A LOT OF OTHER COLUMNS] 
3 | Bla3 | 1  | 1  | 0  | [A LOT OF OTHER COLUMNS] 
4 | Bla4 | 1  | 1  | 1  | [A LOT OF OTHER COLUMNS] 
5 | Bla6 | 1  | 1  | 0  | [A LOT OF OTHER COLUMNS] 
6 | Bla7 | 0  | 0  | 1  | [A LOT OF OTHER COLUMNS] 
7 | Bla1 | 0  | 1  | 1  | [A LOT OF OTHER COLUMNS] 
8 | Bla1 | 1  | 1  | 0  | [A LOT OF OTHER COLUMNS] 

数据帧所需的输出中为ELSE IF JOE_DOE

id | Desc | Active | Enabled | Value | [A LOT OF OTHER COLUMNS] 
2 | Bla3 | 1  | 1  | 1  | [A LOT OF OTHER COLUMNS] 
3 | Bla3 | 1  | 1  | 0  | [A LOT OF OTHER COLUMNS] 
7 | Bla1 | 0  | 1  | 1  | [A LOT OF OTHER COLUMNS] 

数据帧所需的输出中

id | Desc | Active | Enabled | Value | [A LOT OF OTHER COLUMNS] 
1 | Bla2 | 1  | 0  | 1  | [A LOT OF OTHER COLUMNS] 
4 | Bla4 | 1  | 1  | 1  | [A LOT OF OTHER COLUMNS] 
5 | Bla6 | 1  | 1  | 0  | [A LOT OF OTHER COLUMNS] 
6 | Bla7 | 0  | 0  | 1  | [A LOT OF OTHER COLUMNS] 
8 | Bla1 | 1  | 1  | 0  | [A LOT OF OTHER COLUMNS] 
+0

这个问题很混乱 - 请问您能否提供您的df样本?你是想模仿案例陈述,还是只需要知道如何陈述if语句? –

+0

我想根据Python/pandas中的SQL条件筛选我的'df'。在'if'中,我想根据SQL('THEN 1')中的条件保留所有记录。在'else'中,我想保留所有不符合SQL条件的记录('ELSE 0') – orangetacos

+0

'句子'字典包含所有的SQL案例,因为'Bla1'检查'Value'字段。另外两个检查“Active”字段。 – orangetacos

回答

1

这样的事情应该工作。熊猫可以采取任何数量的逻辑参数来过滤数据帧。 &|用于分隔参数,而~用于否定参数。我不明白你建立的dict的需要,我认为在这种情况下是不必要的。

logic1 = (df.Desc=='Bla11') & (df.Value==1) & (df.Enabled==1) 
logic2 = (df.Desc=='Bla12') & (df.Active==1) & (df.Enabled==1) 
logic3 = (df.Desc=='Bla13') & (df.Active==1) & (df.Enabled==1) 

if joe_doe: 
    df = df[logic1 | logic2 | logic3] 
else: 
    df = df[~logic1 & ~logic2 & ~logic3] 
return df 
相关问题