R：组由多个列和计算

我有以下的数据帧，df：R：组由多个列和计算

LeftOrRight SpeedCategory NumThruLanes 
R   25to45   3    
L   45to62   2   
R   Gt62   1

我想通过SpeedCategory组，并依次通过其他列得到的每个唯一代码的频率在每个速度类别 - 是这样的：

    25to45 45to62 Gt62 
LeftOrRight L  0  1 0 
       R  1  0 1 
NumThruLanes 1  0  0 1 
       2  0  1 0 
       3  1  0 0

最近我已经能够来是这样的：

for (col in df){ 
tbl <- table(col, df$SpeedCategory) 
print(tbl) 
}

打印出以下（第一SpeedCategory，然后NumThruLanes）：

col 25to45 45to62 Gt62 
    L  0  1 0 
    R  1  0 1 

col 25to45 45to62 Gt62 
    1  0  0 1 
    2  0  1 0 
    3  1  0 0

我敢肯定，我可以完成我的目标与aggregate()或从dplyr也许GROUP_BY，但我是新来的R和想不通出语法。在pandas我会使用MultiIndex，但我不知道R等价物是什么，所以很难谷歌。

我想尝试通过一个循环或循环来完成所有任务，因为我有十几个要通过的列。

来源

2016-12-23 ale19

的tables包可以很容易地格式化表格以特定的方式。语法需要一些时间来适应，但对于这个问题，这是很直接：

exd <- read.table(text = "LeftOrRight SpeedCategory NumThruLanes 
R   25to45   3    
L   45to62   2   
R   Gt62   1", header = TRUE)  

## to get counts by default we need everything to be categorical 
exd$SpeedCategory <- factor(exd$SpeedCategory) 

library(tables) 
tabular(LeftOrRight + NumThruLanes ~ SpeedCategory, data = exd) 

##    SpeedCategory    
##    25to45  45to62 Gt62 
## LeftOrRight L 0    1  0 
##    R 1    0  1 
## NumThruLanes 1 0    0  1 
##    2 0    1  0 
##    3 1    0  0

如果你有很多的列遍历，您可以通过编程构建公式，例如，

tabular(as.formula(paste(paste(names(exd)[-2], collapse = " + "), 
         names(exd)[2], sep = " ~ ")), 
     data = exd)

作为奖励，有html和latex方法，可以很容易地标记您的表，以包括在文章或报告。

来源

2016-12-23 20:36:26 Ista

这正是我需要的，谢谢！最后，我不得不将所有的列转换为lapply（df，factor）的因子，并且在那之后它运行良好。 – ale19

在一个通这不会做的一切，但可能让你在正确的方向

library(reshape2) 

dcast(df, LeftOrRight ~ SpeedCategory, fun.aggregate = length) 
dcast(df, NumThruLanes ~ SpeedCategory, fun.aggregate = length)

来源

2016-12-23 19:31:52 manotheshark

要与dcast从reshape2包你可以这样做：

library("reshape2") 

DF=read.table(text="LeftOrRight SpeedCategory NumThruLanes 
R   25to45   3    
L   45to62   2   
R   Gt62   1",header=TRUE,stringsAsFactors=FALSE) 

LR_Stat = dcast(DF,LeftOrRight ~ SpeedCategory,length,fill=0) 
LR_Stat 
# LeftOrRight 25to45 45to62 Gt62 
#1   L  0  1 0 
#2   R  1  0 1 

Lanes_Stat = dcast(DF,NumThruLanes ~ SpeedCategory,length,fill=0) 
Lanes_Stat 
# NumThruLanes 25to45 45to62 Gt62 
#1   1  0  0 1 
#2   2  0  1 0 
#3   3  1  0 0

注意LR_Stat应在预期的输出中有1到45to62的范围

来源

2016-12-23 19:32:19 OdeToMyFiddle

修好了，谢谢！这有效，但我有很多列需要通过。有没有办法做到这一点，而不明确命名列？我尝试循环和追加每个对象到一个空白的数据框，但这似乎并没有工作... – ale19

您可以使用lapply()而不是for循环完成所有操作：

tab_list <- lapply(df[, -2], function(col) table(col, df$SpeedCategory)) 
tab_list 
## $LeftOrRight 
##  
## col 25to45 45to62 Gt62 
## L  0  1 0 
## R  1  0 1 
## 
## $NumThruLanes 
##  
## col 25to45 45to62 Gt62 
## 1  0  0 1 
## 2  0  1 0 
## 3  1  0 0

然后，您可以将表合并成使用rbind()与do.call()之一：

do.call(rbind, tab_list) 
## 25to45 45to62 Gt62 
## L  0  1 0 
## R  1  0 1 
## 1  0  0 1 
## 2  0  1 0 
## 3  1  0 0

这是可能得到的指示从原始数据帧列名的输出表中的列。要做到这一点，你需要在lapply()一个较为复杂的功能列名：

tab_list <- lapply(names(df)[-2], function(col) { 
    tab <- table(df[, col], df[, "SpeedCategory"]) 
    name_col <- c(col, rep("", nrow(tab) - 1)) 
    mat <- cbind(name_col, rownames(tab), tab) 
    as.data.frame(mat) 
    }) 
do.call(rbind, tab_list) 
##  name_col V2 25to45 45to62 Gt62 
## L LeftOrRight L  0  1 0 
## R    R  1  0 1 
## 1 NumThruLanes 1  0  0 1 
## 2    2  0  1 0 
## 3    3  1  0 0

来源

2016-12-23 19:34:09 Stibu

这看起来很有前途。有没有办法在do.call（）（除了手动添加一列之外）中保留行的每个细分的列名（LeftOrRight，NumThruLanes等），使它看起来更像我的期望输出？ – ale19

R：组由多个列和计算

回答

相关问题