2016-01-13 28 views
0

我想知道我怎么可以使用循环功能来计算应用功能为每个分组

apply(table(data$people,data$event),2,function(x) mean(x[x>0])) 

对于颜色的每个级别。我的意思是,我想为Color的每个级别计算上述函数。

people <-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6") 
event<-c("a","b","b","M","s","f","y","b","a","a","a","a","s","c","c","b","m","a") 
Colour<-c("red","blue","green","pink","red","blue","grean","red","red","black","pink","blue","blue","green","blue","green","green","red") 

data<-data.frame(people,event,Colour) 
+1

由于此问题与算法设计无关,因此请留下'算法'标签。 – Gregor

+1

你想要的输出是什么?你想做什么并不是很清楚。 –

+0

让我试着把话放在嘴里,然后告诉我我是否正确:对于每一个'Colour',你想要计算每个'event'处的'people'的数量,并将其总结为平均数'全体*参加*活动的人员(平均包括非零出勤率)。是吗? – Gregor

回答

0

做你的功能,每个组,让我们先让它的功能:

your_function = function(data) { 
    apply(table(data$people,data$event),2,function(x) mean(x[x>0])) 
} 

然后我们就可以通过颜色多达分割你的数据和应用的功能,每个子数据帧:

dat_split = split(data, f = data$Colour) 
results = lapply(dat_split, your_function) 

results 
# $black 
# a b c f m M s y 
# 1 NaN NaN NaN NaN NaN NaN NaN 
# 
# $blue 
# a b c f m M s y 
# 1 1 1 1 NaN NaN 1 NaN 
# 
# $grean 
# a b c f m M s y 
# NaN NaN NaN NaN NaN NaN NaN 1 
# ... 

就我个人而言,我不觉得这非常友好。 data.tabledplyr使数据框的子集容易处理。我会从一开始就使用dplyr,如下所示:

library(dplyr) 
data %>% group_by(people, Colour, event) %>% 
    summarize(n = n()) %>% 
    group_by(Colour, event) %>% 
    summarize(mean = mean(n)) %>% 
    tidyr::spread(key = event, value = mean) 

# Source: local data frame [6 x 9] 
# 
# Colour  a  b  c  f  m  M  s  y 
# (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) 
# 1 black  1 NA NA NA NA NA NA NA 
# 2 blue  1  1  1  1 NA NA  1 NA 
# 3 grean NA NA NA NA NA NA NA  1 
# ... 
+2

如果您在第一个版本的'results'中使用'sapply'而不是'lapply',那么您将获得更好看的表格。 – alistaire

+0

@Gregor,另一个问题是,当我将第一个解决方案应用于我的数据集时,它的工作原理是错误的,但是第二个解决方案出现此错误:错误:所有列必须命名为 有关此问题的任何想法? – shoorideh

+0

之前没有看到过这个错误。你所有的专栏都有名字吗? – Gregor