我已经编写了一些代码来查看数据框,并通过一列中的最小值创建一个新的数据框,并按一个因子分割。但是,我无法弄清楚如何在结果中包含另一列(而不是将其用作因子)。包含一个额外的聚合列()
例如,以下代码创建一个数据框,其中包含“状态”,“机构”,“评级1”,“评级2”和“垃圾数据”列。然后,它找出列出的每个州的最低等级1,并为每个州创建一个每个最低等级的数据帧。但是,说我想包括“机构”栏。我怎么做? 我也搞过一些plyr解决方案,但没有骰子。
下面是我没有指定“Institution”列时使用的代码,足以说我已经试过把它放在我能想到但没有成功的每个地方。
##create the data frame
State <- c("AZ","AZ","AZ","CA","CA","CA","CA","CA","NY","NY","NY","NY","SD","SD")
Institution <- c("Institution 1","Institution 2","Institution 3","Institution 4","Institution 5","Institution 6","Institution 7","Institution 8","Institution 9","Institution 10","Institution 11","Institution 12","Institution 13","Institution 14")
Rating1 <- c(3.4, 5.6,2.2,6.3,8.3,2.1,3.3,9.7,7.7,5.4,9.9,3.2,6.1,5.2)
Rating2 <- c(8.4,3.4,6.5,2.5,7.5,4.2,5.6,8.3,4.9,3.3,1.1,8,7.7,3.3)
Junkdata <- c("junk","more junk","superfluous","junk","more junk","superfluous","junk","more junk","superfluous","junk","more junk","superfluous","junk","more junk")
data.df <- data.frame(State, Institution, Rating1, Rating2, Junkdata)
## Use aggregate to find the minimum of Rating 1 for each State
new.df <- aggregate(data.df$Rating1 ~ State, data.df, min)
使用plyr:
library(plyr)
new.df.2 <- ddply(new.df, .(State), summarise, min=min(data.df$Rating1))
我觉得你使用了错误的工具(S) ,您正在描述子集数据data.df [!! ave(data.df $ Rating1,data.df $ State,FUN = function(x)x == min(x)),1:3]','aggregate '和'summarise'对数据的子集进行计算。结果是相同的,因为无论您是否进行过滤或汇总/汇总,min都保持不变,这有点让人困惑。 – rawr
这也行得通!我真的不明白双重感叹号正在做什么。关于使用平均值,总和等,那些在我尝试做的事情中是没有意义的。 – Thoughtcraft
我用'ave'返回'c(0,1,0)'等和'!! c(0,1,0)'只是把它变成逻辑。或者相当于as.logical(c(0,1,0))',但另一种方式更快。我从@akrun获得了 – rawr