我对Python & pandas比较陌生，并且在（分层）索引中挣扎。我已经涵盖了基础知识，但是由于更先进的切片和横切片而丢失了。在熊猫数据框中排除索引行的最有效方法

例如，下面的数据帧

import pandas as pd 
import numpy as np 
data = pd.DataFrame(np.arange(9).reshape((3, 3)), 
    index=pd.Index(['Ohio', 'Colorado', 'New York'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))

我要选择除与指数“科罗拉多”行了一切。对于一个小数据集，我可以这样做：

data.ix[['Ohio','New York']]

但是，如果唯一索引值的数目很大，那是不切实际的。天真地，我期待一个语法，如

data.ix[['state' != 'Colorado']]

但是，这只返回第一个记录'俄亥俄'，不返回'纽约'。这个工程，但很麻烦

filter = list(set(data.index.get_level_values(0).unique()) - set(['Colorado'])) 
data[filter]

肯定会有一个更Pythonic，这样做的详细方式？

来源

2014-02-08 dkapitan

这是一个Python问题，而不是pandas一：'state' != 'Colorado'是真的，所以pandas得到的是data.ix[[True]]。

你可以做

>>> data.loc[data.index != "Colorado"] 
number one two three 
state      
Ohio  0 1  2 
New York 6 7  8 

[2 rows x 3 columns]

或使用DataFrame.query：

>>> data.query("state != 'New York'") 
number one two three 
state      
Ohio  0 1  2 
Colorado 3 4  5 

[2 rows x 3 columns]

，如果你不喜欢的data重复。（引用传递给.query()方法表达是回避的事实，否则的Python会前pandas见过它评估比较的唯一途径之一。）

来源

2014-02-08 19:40:25 DSM

感谢：即澄清了很多！ – dkapitan

这是一个强大的解决方案，也将与多指标工作对象

单指标

excluded = ['Ohio'] 
indices = data.index.get_level_values('state').difference(excluded) 
indx = pd.IndexSlice[indices.values]

输出

In [77]: data.loc[indx] 
Out[77]: 
number one two three 
state 
Colorado 3 4  5 
New York 6 7  8

多指标Extensi在

这里我扩展到一个MultiIndex的例子...

data = pd.DataFrame(np.arange(18).reshape(6,3), index=pd.MultiIndex(levels=[[u'AU', u'UK'], [u'Derby', u'Kensington', u'Newcastle', u'Sydney']], labels=[[0, 0, 0, 1, 1, 1], [0, 2, 3, 0, 1, 2]], names=[u'country', u'town']), columns=pd.Index(['one', 'two', 'three'], name='number'))

假设我们要排除这两个例子'Newcastle'在这个新的多指标

excluded = ['Newcastle'] 
indices = data.index.get_level_values('town').difference(excluded) 
indx = pd.IndexSlice[:, indices.values]

这给预期的结果

In [115]: data.loc[indx, :] 
Out[115]: 
number    one two three 
country town 
AU  Derby   0 1  2 
     Sydney  3 4  5 
UK  Derby   0 1  2 
     Kensington 3 4  5

常见缺陷

确保所有级别的索引排序，您需要data.sort_index(inplace=True)
确保您包括列data.loc[indx, :]
空片有时indx = pd.IndexSlice[:, indices]是不够好，但我发现，我经常需要使用indx = pd.IndexSlice[:, indices.values]

来源

2017-08-03 15:03:07

在熊猫数据框中排除索引行的最有效方法

回答

单指标

多指标Extensi在

常见缺陷

相关问题