2013-04-02 81 views
1

这是有关的现有问题的条件的数值,其中可以找到:替换由NA根据从其他列

Replace a numerical value by NA based on conditions from other columns:

下面是数据:

DT <- data.table(a = sample(c("C","M","Y","K"), 100, rep=TRUE), 
        b = sample(c("A","S"), 100, rep=TRUE), 
        f = round(rnorm(n=100, mean=.90, sd=.08),digits = 2)); DT 

我想要对以下功能进行优雅和简洁的重写:

`%between%` <- function(x, vals) { x >= vals[1] & x <= vals[2]} 
`%nbetween%` <- Negate(`%between%`) 

和下面的脚本来替换满足一定的条件与NA

DT[a == "C" & b %in% c("A", "S") & f %nbetween% c(.85, .95), f := NA] 
DT[a == "M" & b %in% c("A", "S") & f %nbetween% c(.85, .95), f := NA] 
DT[a == "Y" & b %in% c("A", "S") & f %nbetween% c(.80, .90), f := NA] 
DT[a == "K" & b %in% c("A", "S") & f %nbetween% c(.95, 1.10), f := NA] 
+1

那些看起来既优雅又对我简洁。也许你可以扩大你的问题,包括这是不是很好的enoguh?你可以在你的'data.table'和'J()'函数上使用'keys'来做你的子集以避免向量搜索。 – Justin

+0

我跟着学校教了我什么,不要重新打字;写一个函数 –

回答

4

如果矢量化的功能有一定价值,那么你可以把它多了几分优雅:

`%between%` <- function(x, vals, vals2) x >= vals & x <= vals2 
`%nbetween%` <- Negate(`%between%`) 

# This will get you a nice ranges table. 
ranges<-data.table(a=c('C','M','Y','K'),low=c(0.85,0.85,0.80,0.95),high=c(0.95,0.95,0.90,1.10)) 
# Set the keys for an easy merge. 
setkeyv(ranges,'a') 
setkeyv(DT,'a') 
# Merge and filter.  
DT<-merge(DT,ranges,all.x=TRUE)[b %in% c('A','S') & `%nbetween%`(f,low,high),f:=NA ] 
# A nice suggestion from the comments: 
DT<-DT[ranges][b %in% c('A','S') & `%nbetween%`(f,low,high),f:=NA] 

#  a b f low high 
# 1: C S 0.88 0.85 0.95 
# 2: C S NA 0.85 0.95 
# 3: C S 0.92 0.85 0.95 
# 4: C A 0.94 0.85 0.95 
# 5: C S NA 0.85 0.95 
# 6: C S 0.90 0.85 0.95 
+2

也许最后一行应该是(通过引用而不是复制):''DT [范围] [%%(%低,高),f:= NA]' –

+1

非常感谢你nograpes,我喜欢它 –

+1

谢谢你蓝色魔导师,很好地工作。 –