我想了解如何完成“group by”和“count”功能。我看了好几篇文章,没有找到我想要的东西;如果有已经发布的答案,我会很感激链接。是否存在与SELECT ... COUNT(*)... GROUP BY ...等价的等价物?
例如,我正在查找数据中的异常值;我想知道哪些地方收到的最“坏”的措施:
place = rep(c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','GA','HI'), times=4)
measure = rep(c('meas1','meas2','meas3','meas4'), each=11)
set.seed(200)
rating = sample(c('good','bad'), size = 44, prob=c(2,1), replace=T)
df = data.frame(place, measure, rating)
> df
place measure rating
1 AL meas1 good
2 AK meas1 good
3 AZ meas1 good
4 AR meas1 bad
5 CA meas1 bad
6 CO meas1 bad
7 CT meas1 bad
8 DE meas1 good
9 FL meas1 good
10 GA meas1 good
....(etc).....
我想了解如何使用tidyverse做到这一点。这种方法使用sqldf给我我想要的东西,也就是告诉我哪些地方过的最“坏”的收视率,并通过他们的“坏性”
library(sqldf)
sqldf("SELECT place, rating, COUNT(*) AS Count FROM df GROUP BY place, rating ORDER BY rating, count DESC").
place rating Count
1 CA bad 3
2 AK bad 2
3 AR bad 1
4 CO bad 1
5 CT bad 1
6 DE bad 1
7 FL bad 1
8 GA bad 1
9 AL good 4
10 AZ good 4
11 HI good 4
....(etc)....
居的地方有没有办法做得到类似的结果在tidyverse?
尝试'df%>%count(place,rating)%>%arrange(rating,desc(n))' –
你能解释一下吗?它当然是做我希望的。 – cumin
尝试使用'?count','?arrange'和'?desc' ..阅读手册可能会帮助您学到一两件事 –