2016-10-19 41 views
1

我有一个数据框,在列中有响应和预测变量,在行中有观察值。响应中的某些值低于给定的检测限(LOD)。由于我计划对答复应用排名转换,因此我希望将所有这些值设置为等于LOD。可以说,数据帧是将混合值DataFrame中的特定值设置为固定值?

data.head() 

    age response1 response2 response3 risk  sex smoking 
0 33 0.272206 0.358059 0.585652 no female  yes 
1 38 0.425486 0.675391 0.721062 yes female  no 
2 20 0.910602 0.200606 0.664955 yes female  no 
3 38 0.966014 0.584317 0.923788 yes female  no 
4 27 0.756356 0.550512 0.106534 no female  yes 

我愿做

responses = ['response1', 'response2', 'response3'] 
LOD = 0.2 

data[responses][data[responses] <= LOD] = LOD 

其中有多个原因不工作(如大熊猫不知道是否应该产生对数据的视图或不,它不会,我猜)

我如何在

data[responses] <= LOD 

等于LOD设置的所有值?


最少例如:

import numpy as np 
import pandas as pd 

from pandas import Series, DataFrame 

x = Series(random.randint(0,2,50), dtype='category') 
x.cat.categories = ['no', 'yes'] 

y = Series(random.randint(0,2,50), dtype='category') 
y.cat.categories = ['no', 'yes'] 

z = Series(random.randint(0,2,50), dtype='category') 
z.cat.categories = ['male', 'female'] 

a = Series(random.randint(20,60,50), dtype='category') 

data = DataFrame({'risk':x, 'smoking':y, 'sex':z, 
    'response1': random.rand(50), 
    'response2': random.rand(50), 
    'response3': random.rand(50), 
    'age':a}) 
+0

做'数据[数据[应答] <= LOD] = 0.2' – EdChum

回答

0

可以使用DataFrame.mask

import numpy as np 
import pandas as pd 

np.random.seed(123) 
x = pd.Series(np.random.randint(0,2,10), dtype='category') 
x.cat.categories = ['no', 'yes'] 
y = pd.Series(np.random.randint(0,2,10), dtype='category') 
y.cat.categories = ['no', 'yes'] 
z = pd.Series(np.random.randint(0,2,10), dtype='category') 
z.cat.categories = ['male', 'female'] 

a = pd.Series(np.random.randint(20,60,10), dtype='category') 

data = pd.DataFrame({ 
'risk':x, 
'smoking':y, 
'sex':z, 
'response1': np.random.rand(10), 
'response2': np.random.rand(10), 
'response3': np.random.rand(10), 
'age':a}) 
print (data) 
    age response1 response2 response3 risk  sex smoking 
0 24 0.722443 0.425830 0.866309 no male  yes 
1 23 0.322959 0.312261 0.250455 yes male  yes 
2 22 0.361789 0.426351 0.483034 no female  no 
3 40 0.228263 0.893389 0.985560 no female  yes 
4 59 0.293714 0.944160 0.519485 no female  no 
5 22 0.630976 0.501837 0.612895 no male  yes 
6 40 0.092105 0.623953 0.120629 no female  no 
7 27 0.433701 0.115618 0.826341 yes male  yes 
8 55 0.430863 0.317285 0.603060 yes male  yes 
9 48 0.493685 0.414826 0.545068 no male  no 
responses = ['response1', 'response2', 'response3'] 
LOD = 0.2 

print (data[responses] <= LOD) 
    response1 response2 response3 
0  False  False  False 
1  False  False  False 
2  False  False  False 
3  False  False  False 
4  False  False  False 
5  False  False  False 
6  True  False  True 
7  False  True  False 
8  False  False  False 
9  False  False  False 

data[responses] = data[responses].mask(data[responses] <= LOD, LOD) 
print (data) 
    age response1 response2 response3 risk  sex smoking 
0 24 0.722443 0.425830 0.866309 no male  yes 
1 23 0.322959 0.312261 0.250455 yes male  yes 
2 22 0.361789 0.426351 0.483034 no female  no 
3 40 0.228263 0.893389 0.985560 no female  yes 
4 59 0.293714 0.944160 0.519485 no female  no 
5 22 0.630976 0.501837 0.612895 no male  yes 
6 40 0.200000 0.623953 0.200000 no female  no 
7 27 0.433701 0.200000 0.826341 yes male  yes 
8 55 0.430863 0.317285 0.603060 yes male  yes 
9 48 0.493685 0.414826 0.545068 no male  no 
+0

如何它工作吗? – jezrael

+0

Thx,它工作完美!今天学到了熊猫的另一个功能。 .mask看起来确实很强大。 –