这里的排除因素的快捷方式或者任何你想考虑这样的:
set.seed(1)
N <- 20
dat <- data.frame(
x = factor(sample(LETTERS[1:5],N,replace=TRUE)),
y = rnorm(N,5,12),
z = rnorm(N,-5,17) + runif(N,2,12)
)
#' Function which wraps preProcess to exclude factors from the model.matrix
ppWrapper <- function(x, excludeClasses=c("factor"), ...) {
whichToExclude <- sapply(x, function(y) any(sapply(excludeClasses, function(excludeClass) is(y,excludeClass))))
processedMat <- predict(preProcess(x[!whichToExclude], ...), newdata=x[!whichToExclude])
x[!whichToExclude] <- processedMat
x
}
> ppWrapper(dat)
x y z
1 C 1.6173595 -0.44054795
2 A -0.2933705 -1.98856921
3 C 1.2177384 0.65420288
4 D -0.8710374 0.62409408
5 D -0.4504202 -0.34048640
6 D -0.6943283 0.24236671
7 E 0.7778192 0.91606677
8 D 0.2184563 -0.44935163
9 C -0.3611408 0.26075970
10 B -0.7066441 -0.23046073
11 D -1.5154339 -0.75549761
12 D 0.4504825 0.38552988
13 B 1.5692675 0.04093040
14 C 0.4127541 0.13161807
15 D 0.5426321 1.09527418
16 B -2.1040322 -0.04544407
17 C 0.6928574 1.12090541
18 B 0.3580960 1.91446230
19 E 0.3619967 -0.89018040
20 A -1.2230522 -2.24567237
你可以传递你想进入ppWrapper
任何东西,它会被一起preProcess
通过。
我认为我们确实只需要pclass的两个变量。 (“pclass1st,pclass2nd”或“pclass2nd,pclass3rd”或“pclass3rd,pclass1st”)。就像性别可变的情况一样,我们只考虑过sexmale和丢弃sexfemale。纠正我,如果它不够。 – Sandeep
@topepo,我想下面的答案忽略了待办事项列表。我建议为那些不注意的人添加一些警告。 –