过滤熊猫DF基于在列中有两个可能的值

所以我有一个DF，看起来像这样：过滤熊猫DF基于在列中有两个可能的值

Created UserID Service 
1/1/2016 a CWS 
1/2/2016 a Other 
3/5/2016 a Drive 
2/7/2017 b Enhancement 
... ... ...

我想基于对水煤浆和驱动器的“服务”列中的值对其进行过滤。我这样做：

df=df[(df.Service=="CWS") or (df.Service=="Drive")]

它不工作。有任何想法吗？

来源

2017-06-13 Josh Dautel

使用失败者'|'，而不是'或' – DyZ

我们的'|'运营商像'DF = DF [（df.Service == “CWS”）|（DF .Service ==“Drive”）]' – johnchase

极品逐位与|（or）比较：

df=df[(df.Service=="CWS") | (df.Service=="Drive")]

更好的是使用isin：

df=df[(df.Service.isin(["CWS", "Drive")]])

或者使用query：

df = df.query('Service=="CWS" | Service=="Drive"')

或者query with list：

df = df.query('Service== ["Other", "Drive"]')

print (df) 
    Created UserID Service 
1 1/2/2016  a Other 
2 3/5/2016  a Drive

来源

2017-06-13 17:51:55 jezrael

你是一个传奇人物，谢谢！ –

@ScottBoston - 有时是的，有时候不是......这取决于......但我认为在SO中有更多的传奇人物（'unutbu' ，'DSM'，'Jeff'，'EdChum'），但在我看来他们没有太多时间......但我不能忘记'piRSquared'，'MaxU'和'Psidom' - 非常好......和后来的传奇 – jezrael

它已被标记为[**传奇**]（https://stackoverflow.com/help/badges/146/legendary）...他是一个传奇:-) – piRSquared

您还可以使用pandas.Series.str.match

df[df.Service.str.match('CWS|Drive')] 

    Created UserID Service 
0 1/1/2016  a  CWS 
2 3/5/2016  a Drive

其他口味
为乐趣！

numpy-fi

s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[c1 | c2]

添加numexpr

import numexpr as ne 

s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[ne.evaluate('c1 | c2')]

时序
isin是赢家！ str.match是:-(

np.random.seed([3,1415]) 
df = pd.DataFrame(dict(
     Service=np.random.choice(['CWS', 'Drive', 'Other', 'Enhancement'], 100000))) 

%timeit df[(df.Service == "CWS") | (df.Service == "Drive")] 
%timeit df[df.Service.isin(["CWS", "Drive"])] 
%timeit df.query('Service == "CWS" | Service == "Drive"') 
%timeit df.query('Service == ["Other", "Drive"]') 
%timeit df.query('Service in ["Other", "Drive"]') 
%timeit df[df.Service.str.match('CWS|Drive')] 

100 loops, best of 3: 16.7 ms per loop 
100 loops, best of 3: 4.46 ms per loop 
100 loops, best of 3: 7.74 ms per loop 
100 loops, best of 3: 5.77 ms per loop 
100 loops, best of 3: 5.69 ms per loop 
10 loops, best of 3: 67.3 ms per loop 

%%timeit 
s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[c1 | c2] 

100 loops, best of 3: 5.47 ms per loop 

%%timeit 
import numexpr as ne 

s = df.Service.values 
c1 = s == 'CWS' 
c2 = s == 'Drive' 
df[ne.evaluate('c1 | c2')] 

100 loops, best of 3: 5.65 ms per loop

来源

2017-06-13 18:09:40 piRSquared

过滤熊猫DF基于在列中有两个可能的值

回答

相关问题