2017-02-13 23 views
4

我看过很多关于此主题的帖子,所以如果这是重复的道歉,但我无法弄清楚我的问题。如果字符串中包含特定文本的聚合字段

df <- data.frame(name = c('bike+ride','shoe+store','ride','mountian%20bike','ride+along'), 
      count = c(2,5,8,7,6)) 

,并要总结各count如果name包含字符串group

group <- data.frame(group = c('ride','bike')) 

所以,最终的结果如下所示:

Group Count 
bike  9 
ride  16 

人帮帮我?

回答

3

甲基础R想法,

sapply(sapply(as.character(group$group), function(i) grep(i, df$name)), function(i) sum(df$count[i])) 


#or make it a function 

aggr1 <- function(var1, grp, cnt){ 
    m1 <- sapply(as.character(grp), function(i) grep(i, var1)) 
    final_d <- sapply(m1, function(i) sum(cnt[i])) 
    return(data.frame(Group = names(final_d), 
        Count = as.integer(final_d), stringsAsFactors = FALSE) 
     ) 
} 

aggr1(df$name, group$group, df$count) 

# Group Count 
#1 ride 16 
#2 bike  9 
+0

感谢您的帮助。任何想法如何处理名称包含字符“+”或“%20”的情况? – Davis

+0

在您的示例中,名称中包含这些字符,并且按预期工作。 – Sotos

1

的一种方式是

do.call(rbind, sapply(group$group, FUN = function(x, df) { 
    out <- df[grepl(pattern = x, x = df$name), ] 
    data.frame(group = x, count = sum(out$count)) 
}, df = df, simplify = FALSE)) 

    group count 
1 ride 16 
2 bike  9 

在两个步骤:

# make a data.frame which locates where each group level is located 
grp <- as.data.frame(sapply(group$group, FUN = function(x) grepl(pattern = x, x = df$name))) 
names(grp) <- group$group 

# based on above location (TRUE/FALSE), sum accordingly 
data.frame(count = apply(grp, MARGIN = 2, FUN = function(x, df) { 
    sum(df[x, "count"]) 
}, df = df)) 

    count 
ride 16 
bike  9 
+0

如果我可以改进一些速度? data.frame(group = group,count = sapply(group $ group,FUN = function(x)sum(df [grepl(pattern = x,x = df $ name,fixed = TRUE),“count”])) )'(在每次迭代中不需要在'do.call'或创建一个'data.frame',添加'fixed = TRUE'等) –

0

使用tidyversepurrrdplyrtidyr一种方法:

library(tidyverse) # for dplyr, purr and tidyr 

groups <- c('ride','bike') 

map_df(groups, ~setNames(summarize_(df, interp(~sum(df$count[grepl(var, name)], na.rm = TRUE), var = .x)), .x)) %>% 
     gather(group, count, na.rm = TRUE) 
+0

当涉及到额外的'group $ group'级别时。 –

+0

我同意。修改了包含额外级别的答案。 –

相关问题