2017-04-04 29 views
0

我是学生计算机科学专业的学生和新手R用户。如何在R中减去分组后续数据

以下是我的数据框。

set.seed(1234) 
df <- data.frame(
        sex = rep(c('M','F'), 10), 
        profession = rep(c('Doctor','Lawyer'), each = 5), 
        pariticpant = rep(1:10, 2), 
        x = runif(20, 1, 10), 
        y = runif(20, 1, 10)) 

enter image description here

我想找到的每一天,每一个参与者在x和y的差异。这将创建一个10行数据框。

dday将取代day,因为这些值将是日期之间的差异。

dday sex profession participant dx dy 
0-1 M Doctor  1   5.22 1.26 
. 
. 
. 

R会执行此功能吗?

+1

你想做什么?什么是所需的输出(实际使用数字,并使用'set.seed()',以便随机数是[reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great -r-reproducible-example)。那天从哪里来?这不在'df'范例中。 – MrFlick

+0

@MrFlick此文章已被编辑。 –

回答

1

看来,天塔从data.frame失踪,但在画面

library(dplyr) 

set.seed(1234) 
df <- data.frame(day = rep(c(0, 1), each = 10), 
      sex = rep(c('M', 'F'), 10), 
      profession = rep(c('Doctor', 'Lawyer'), each = 5), 
      pariticpant = rep(1:10, 2), 
      x = runif(20, 1, 10), 
      y = runif(20, 1, 10)) 

df %>% 
    group_by(pariticpant) %>% 
    mutate(day = paste0(lag(day), "-", day), dx = x - lag(x), dy = y - lag(y)) %>% 
    select(-x, -y) %>% 
    filter(!is.na(dx)) 

Source: local data frame [10 x 8] 
Groups: pariticpant [10] 

    day sex profession pariticpant   dx   dy 
    <chr> <fctr>  <fctr>  <int>  <dbl>  <dbl> 
1 0-1  M  Doctor   1 5.2189909 1.2553112 
2 0-1  F  Doctor   2 -0.6959211 -0.3375603 
3 0-1  M  Doctor   3 -2.9388703 1.3106358 
4 0-1  F  Doctor   4 2.7004864 4.2057986 
5 0-1  M  Doctor   5 -5.1173959 -0.3393300 
6 0-1  F  Lawyer   6 1.7728652 -0.4583513 
7 0-1  M  Lawyer   7 2.4905478 -2.9200456 
8 0-1  F  Lawyer   8 0.3084325 -5.9026351 
9 0-1  M  Lawyer   9 -4.3142487 1.4472483 
10 0-1  F  Lawyer   10 -2.5382271 6.8542387 
+0

谢谢! 您是否主要从'df'解释代码并且'groupby'? –

+1

也许我可以给你一些提示来帮助你,mutate命令只是建立一个新的列dx和dy,lag命令只是移动x个向量,例如'x < - c(1,2, 3,4) lag(x)''会给你'[1] NA 1 2 3',这样''x-lag(x)'是没有其他东西是减去向量x的后续元素。在正确的方向? – Umberto

0

包括你也可以这样做只是这样

set.seed (1) 


df <- data.frame(
day = rep (c(0,1),c(10,10)), 
sex = rep(c('M','F'), 10), 
profession = rep(c('Doctor','Lawyer'), each = 5), 
participant = rep(1:10, 2), 
x = runif(20, 1, 10), 
y = runif(20, 1, 10)) 

现在,我们需要汇集起来的性别,职业和参与者,然后编写一个函数,返回x和y之差的两列。请记住,R中的函数返回最后一个计算的值(在本例中为最后的数据框)。

ddply(df, c("sex", "profession", "participant"), 
    function(dat) { 
    ddx = 2*dat$x[[1]]-dat$x[[2]] 
    ddy = 2*dat$y[[1]]-dat$y[[2]] 
    data.frame (dx = ddx, dy = ddy) 
    }) 

输出的(不重新排序)

sex profession participant   dx   dy 
1 F  Doctor   2 3.9572263 -0.9337529 
2 F  Doctor   4 -0.6294785 3.6342897 
3 F  Lawyer   6 1.6292118 -1.7344123 
4 F  Lawyer   8 0.7850676 1.2878669 
5 F  Lawyer   10 2.1418901 0.3098424 
6 M  Doctor   1 -3.1910030 1.8730386 
7 M  Doctor   3 -4.1488559 5.5640663 
8 M  Doctor   5 0.9190749 -0.2446371 
9 M  Lawyer   7 -3.2924210 5.1612642 
10 M  Lawyer   9 0.0743912 -5.4104425 

希望这有助于你。因为写起来容易理解,所以我找到了ddply函数。