2017-02-10 233 views
0

选择行我有一个pandas.DataFramedf在大熊猫

Property Area dist 
A   50  2 
B   100 3 
C   20  10 
D   1  15 
E   20  16 
F   3  25 

我想最终的数据框有以下形式:

Property Area dist 
A   50  2 
C   20  10 
F   3  25 

即:我想省略这比8更接近行每一个。

+1

具有u累什么? – haifzhan

+1

你的意思是“比每个更接近8”? – Zero

回答

1

我相信这段代码符合你的问题陈述。基本思想是收集一组dist值来保留,然后将这些值应用于数据框。

代码:

# find the dist values to keep 
to_keep = set() 
min_value = None 
min_dist = 8 
for dist in sorted(df['dist']): 
    if min_value <= dist - min_dist: 
     min_value = dist 
     to_keep.add(dist) 

# build a new data frame with just the keep values 
new_df = df.query('dist in @to_keep') 
print(new_df) 

产地:

Area dist 
A 50  2 
C 20 10 
F  3 25 

的样本数据:

import numpy as np 
import pandas as pd 
props = np.array([ 
    ('Property', 'Area', 'dist'), 
    ('A',   50,  2), 
    ('B',   100,  3), 
    ('C',   20,  10), 
    ('D',   1,  15), 
    ('E',   20,  16), 
    ('F',   3,  25), 
    ]) 

df = pd.DataFrame(data=props[1:, 1:], 
        index=props[1:, 0], 
        columns=props[0, 1:]).apply(pd.to_numeric) 
+0

谢谢,一切正常,直到步骤new_df = df.query('dist in @to_keep'),在那里我得到的错误:raise ImportError(“'numexpr'not found。Can not use” ImportError:'numexpr'not found。如果没有安装'numexpr',就不能使用engine ='numexpr'进行查询/评估 – Ssank

+0

我安装了numexpr并且一切正常,谢谢 – Ssank