我具有类似于该样品的数据帧:根据在两列我要通过大小和颜色的项进行分类的信息优化:值替换在数据帧wiith多个条件
df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L))
。输出应该是这样的:
structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100), Class = c("small red ball", "small red ball", "small blue ball", "medium red ball", "medium blue ball", "big red ball")), row.names = c(NA, -6L), .Names = c("Ball", "size", "Class"), class = "data.frame")
我已经运行的代码,但是它很长,混乱的,我相信有一种更简洁的方式让我所需的输出。
那么我做了什么?
我开始选择第一类的项目和重命名选定df$Class
值:
df["Class"] <- NA #add new column
df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"
因为我grepl选择有时是空的,我加了if (length() > 0
)条件:
if (length(df[grepl("red", df$Ball) & df$size <10, ]$Class) > 0) {df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"}
最后我结合我在一个循环中的所有选择
df["Class"] <- NA #add new column
z <- c("red", "blue")
for (i in z){
if (length(df[grepl(i, df$Ball) & df$size <10, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size <10, ]$Class <- paste("small", i, "ball", sep=" ")}
if (length(df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class <- paste("medium", i, "ball", sep=" ")}
if (length(df[grepl(i, df$Ball) & df$size >=100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=100, ]$Class <- paste("big", i, "ball", sep=" ")}
}
它适用于两种颜色和三种尺寸类别,但我的原始数据框要大得多。因此,(因为它看起来非常混乱),我的问题: 我该如何简化我的代码?
我没有看到'stringr'包的本质。我猜base r的工作原理是:'paste(as.character(cut(df $ size,c(1,10,100,Inf),c(“small”,“medium”,“large”))), sub(“ [^(red | blue)]。*“,”“,df $ Ball),'Ball')' – Onyambu
@Onyambu确定'sub'有效,但如果没有匹配,那么它可以返回整个字符串因为'str_extract'返回NA。一个解决方法是'regexpr/regmatches' – akrun
对于small:x <10','medium 10 <= x <100','large:x>应该是'c(1,9,99,Inf) = 100',对吗? – Iris