2017-06-13 44 views
1

所以我有一个DF,看起来像这样:过滤熊猫DF基于在列中有两个可能的值

Created UserID Service 
1/1/2016 a CWS 
1/2/2016 a Other 
3/5/2016 a Drive 
2/7/2017 b Enhancement 
... ... ... 

我想基于对水煤浆和驱动器的“服务”列中的值对其进行过滤。我这样做:

df=df[(df.Service=="CWS") or (df.Service=="Drive")] 

它不工作。有任何想法吗?

+4

使用失败者'|',而不是'或' – DyZ

+1

我们的'|'运营商像'DF = DF [(df.Service == “CWS”)|(DF .Service ==“Drive”)]' – johnchase

回答

3

极品逐位与|or)比较:

df=df[(df.Service=="CWS") | (df.Service=="Drive")] 

更好的是使用isin

df=df[(df.Service.isin(["CWS", "Drive")]]) 

或者使用query

df = df.query('Service=="CWS" | Service=="Drive"') 

或者query with list

df = df.query('Service== ["Other", "Drive"]') 

print (df) 
    Created UserID Service 
1 1/2/2016  a Other 
2 3/5/2016  a Drive 
+2

你是一个传奇人物,谢谢! –

+3

@ScottBoston - 有时是的,有时候不是......这取决于......但我认为在SO中有更多的传奇人物('unutbu' ,'DSM','Jeff','EdChum'),但在我看来他们没有太多时间......但我不能忘记'piRSquared','MaxU'和'Psidom' - 非常好......和后来的传奇 – jezrael

+2

它已被标记为[**传奇**](https://stackoverflow.com/help/badges/146/legendary)...他是一个传奇:-) – piRSquared

1

您还可以使用pandas.Series.str.match

df[df.Service.str.match('CWS|Drive')] 

    Created UserID Service 
0 1/1/2016  a  CWS 
2 3/5/2016  a Drive 

其他口味
为乐趣!

numpy-fi

s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[c1 | c2] 

添加numexpr

import numexpr as ne 

s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[ne.evaluate('c1 | c2')] 

时序
isin是赢家! str.match是:-(

np.random.seed([3,1415]) 
df = pd.DataFrame(dict(
     Service=np.random.choice(['CWS', 'Drive', 'Other', 'Enhancement'], 100000))) 

%timeit df[(df.Service == "CWS") | (df.Service == "Drive")] 
%timeit df[df.Service.isin(["CWS", "Drive"])] 
%timeit df.query('Service == "CWS" | Service == "Drive"') 
%timeit df.query('Service == ["Other", "Drive"]') 
%timeit df.query('Service in ["Other", "Drive"]') 
%timeit df[df.Service.str.match('CWS|Drive')] 

100 loops, best of 3: 16.7 ms per loop 
100 loops, best of 3: 4.46 ms per loop 
100 loops, best of 3: 7.74 ms per loop 
100 loops, best of 3: 5.77 ms per loop 
100 loops, best of 3: 5.69 ms per loop 
10 loops, best of 3: 67.3 ms per loop 

%%timeit 
s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[c1 | c2] 

100 loops, best of 3: 5.47 ms per loop 

%%timeit 
import numexpr as ne 

s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[ne.evaluate('c1 | c2')] 

100 loops, best of 3: 5.65 ms per loop