删除所有行

我有一个DF像这样：删除所有行

Year ID Count 
1997 1 0 
1998 2 0 
1999 3 1 
2000 4 0 
2001 5 1

和我想的1在Count的第一次出现，这将使我之前删除的所有行：

Year ID Count 
1999 3 1 
2000 4 0 
2001 5 1

我可以删除所有行中第一次出现这样的AFTER：

df=df.loc[: df[(df['Count'] == 1)].index[0], :]

但我似乎无法遵循切片逻辑使其做相反的事情。

来源

2016-08-01 Stefano Potter

我会怎么做：

df[(df.Count == 1).idxmax():]

df.Count == 1返回boolean数组。 idxmax()将识别最大值的索引。我知道最大值将是True，当有多个True时，它将返回找到的第一个的位置。这正是你想要的。顺便说一下，该值为2。最后，我将从2开始的所有内容与df[2:]进行分片。我在上面的答案中将所有内容放在一行中。

来源

2016-08-01 20:07:55 piRSquared

您可以使用cumsum()方法：

In [13]: df[(df.Count == 1).cumsum() > 0] 
Out[13]: 
    Year ID Count 
2 1999 3  1 
3 2000 4  0 
4 2001 5  1

说明：

In [14]: (df.Count == 1).cumsum() 
Out[14]: 
0 0 
1 0 
2 1 
3 1 
4 2 
Name: Count, dtype: int32

定时针对500K行DF：

In [18]: df = pd.concat([df] * 10**5, ignore_index=True) 

In [19]: df.shape 
Out[19]: (500000, 3) 

In [20]: %timeit df[(df.Count == 1).idxmax():] 
100 loops, best of 3: 3.7 ms per loop 

In [21]: %timeit df[(df.Count == 1).cumsum() > 0] 
100 loops, best of 3: 16.4 ms per loop 

In [22]: %timeit df.loc[df[(df['Count'] == 1)].index[0]:, :] 
The slowest run took 4.01 times longer than the fastest. This could mean that an intermediate result is being cached. 
100 loops, best of 3: 7.02 ms per loop

结论：@ piRSquared的idxmax()解决方案是一个明确的优胜者...

来源

2016-08-01 20:04:19 MaxU

只是片的其他方式：

如果IDX是你的指数做：

df.loc[idx:]

而不是

df.loc[:idx]

这意味着：

df.loc[df[(df['Count'] == 1)].index[0]:, :]

来源

2016-08-01 20:14:08

使用np.where：

df[np.where(df['Count']==1)[0][0]:]

计时

时序上被一个更大的版本数据帧的执行的：

df = pd.concat([df]*10**5, ignore_index=True)

结果：

%timeit df[np.where(df['Count']==1)[0][0]:] 
100 loops, best of 3: 2.74 ms per loop 

%timeit df[(df.Count == 1).idxmax():] 
100 loops, best of 3: 6.18 ms per loop 

%timeit df[(df.Count == 1).cumsum() > 0] 
10 loops, best of 3: 26.6 ms per loop 

%timeit df.loc[df[(df['Count'] == 1)].index[0]:, :] 
100 loops, best of 3: 11.2 ms per loop

来源

2016-08-01 20:24:03 root

回答

相关问题