I的值有一个数据我称之为sam.data如下:imputting在R和STATA
dput(sam.data)
structure(list(idn = c(1L, 2L, 3L, 4L, 5L, 6L, 66L, 62L, 7L,
81L, 68L, 72L), n1 = c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 6L, 7L, 7L,
7L, 7L), x = c(9.95228, 11.4186, 10.3735, 10.5453, 10.7364, 9.85219,
9.73307, 9.86304, 9.74097, 9.57359, 9.70899, 9.75185)), .Names = c("idn",
"n1", "x"), row.names = c(NA, 12L), class = "data.frame")
sam.data
idn n1 x
1 1 1 9.95228
2 2 2 11.41860
3 3 3 10.37350
4 4 4 10.54530
5 5 5 10.73640
6 6 6 9.85219
7 66 6 9.73307
8 62 6 9.86304
9 7 7 9.74097
10 81 7 9.57359
11 68 7 9.70899
12 72 7 9.75185
对于idn
不等于n1
,创建一个新的变量y
这需要的x
对应的值到n1
,否则我将它分配为缺失。我能够在R
中生成一个紧密的解决方案。不过,我宁愿在R
有优雅的解决方案。另外,我还在“Stata
”中寻找解决方案。
My solution in R:
library(plyr)
sam.data2<-ddply(sam.data,.(n1),transform, y=x[which.min(idn)])
sam.data2
sam.data2
idn n1 x y
1 1 1 9.95228 9.95228
2 2 2 11.41860 11.41860
3 3 3 10.37350 10.37350
4 4 4 10.54530 10.54530
5 5 5 10.73640 10.73640
6 6 6 9.85219 9.85219
7 66 6 9.73307 9.85219
8 62 6 9.86304 9.85219
9 7 7 9.74097 9.74097
10 81 7 9.57359 9.74097
11 68 7 9.70899 9.74097
12 72 7 9.75185 9.74097
Expected output:
idn n1 x y
1 1 1 9.95228
2 2 2 11.41860
3 3 3 10.37350
4 4 4 10.54530
5 5 5 10.73640
6 6 6 9.85219
7 66 6 9.73307 9.85219
8 62 6 9.86304 9.85219
9 7 7 9.74097
10 81 7 9.57359 9.74097
11 68 7 9.70899 9.74097
12 72 7 9.75185 9.74097
感谢“R”解决方案的研究。我更喜欢使用NA,因为我希望列是数字。 – Metrics