生成中的R

缺失值的随机数我有一个数据帧，例如像：生成中的R

df<-data.frame(time1=rbinom(100,1,0.3), 
       time2=rbinom(100,1,0.4), 
       time3=rbinom(100,1,0.5), 
       time4=rbinom(100,1,0.6))

如何可以生成随机缺失值用于与丢失的高达20％的数量每次变量？也就是说，在这种情况下，每列中缺少少于20个的总数，并且从主体（行）随机地错过。

来源

2014-01-06 David Z

你可以这样做：

insert_nas <- function(x) { 
    len <- length(x) 
    n <- sample(1:floor(0.2*len), 1) 
    i <- sample(1:len, n) 
    x[i] <- NA 
    x 
} 

df2 <- sapply(df, insert_nas) 
df2

这会给您以最大每列20个％missings

colSums(is.na(df2))/nrow(df2) 

time1 time2 time3 time4 
0.09 0.16 0.19 0.14

来源

2014-01-06 14:29:22

我宁愿'n < - sample.int（floor（0.2 * len），1）;我< - 样本（seq_along（x），n）'。 – Roland

这样的事情，你是什么意思？

nomissing <- sample(1:20,1) 
testnos <- rbinom(100 - nomissing,1,0.3) 
testnas <- rep(NA,nomissing) 
testmix <- sample(x = c(testnos,testnas),100)

输出 -

> testmix 
    [1] 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 
[37] 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 NA 0 1 0 0 
[73] 0 0 1 1 0 0 1 0 0 1 1 0 0 NA 1 0 0 0 0 0 1 0 NA NA 1 0 0 0

来源

2014-01-06 14:25:37 TheComeOnMan

缺少**高达20 **，这意味着每个变量可以具有任何数量的缺失值的范围从1-20的数。并且行数仍然是100. –

编辑........... – TheComeOnMan

这里有一种方法：

as.data.frame(lapply(df, function(x) 
       "is.na<-"(x, sample(seq(x), floor(length(x) * runif(1, 0, .2))))))

来源

2014-01-06 14:31:28

回答

相关问题