2016-04-25 28 views
2

我的数据看起来像表如何使显示其更改时间时间r中

ID Joint_time leave_time group 
1 201501  201603  2 
2 201508  201601  2 
3 201503  201601  2 
4 201512  201601  3 
5 201511  201602  2 
6 201503  .   1 
7 201503  .   1 
8 201506  201602  3 
9 201507  .   1 
10 201503  .   1 
11 201601  201602  2 
12 201601  .   1 
13 201601  201603  2 
14 201601  201602  3 
15 201601  201602  3 
16 201602  .   1 
17 201602  .   1 
18 201602  201603  3 
19 201602  .   1 
20 201602  .   1 
21 201602  .   1 
22 201603  .   1 
23 201603  .   1 
24 201603  .   1 
25 201603  .   1 
26 201603  .   1 
27 201603  .   1 
28 201603  .   1 

我想知道在每个月底的变化,总的客户号。我想演示离开和加入的客户编号。我只知道使用table()。但是这段代码似乎并没有处理这种复杂的表格。 我的数据是如下

ID<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28) 
Joint_time<-c("201501","201508","201503","201512","201511","201503","201503","201506","201507","201503","201601","201601","201601","201601","201601","201602","201602","201602","201602","201602","201602","201603","201603","201603","201603","201603","201603","201603") 
leave_time<-c("201603","201601","201601","201601","201602",".",".","201602",".",".","201602",".","201603","201602","201602",".",".","201603",".",".",".",".",".",".",".",".",".",".") 
group<-c(2,2,2,3,2,1,1,3,1,1,2,1,2,3,3,1,1,3,1,1,1,1,1,1,1,1,1,1) 
question_table<-data.frame(ID,Joint_time,leave_time,group) 

我想建一个表如下

           201601 201602 201603 
Total number in month beginning     10  12  13 
Joint this month         5  6  7 
Group 2 who joint during 2015 leave this month  2  1  1 
Group 2 who joint during 2016 leave this month  0  1  1 
Group 3 who joint during 2015 leave this month  1  1  0 
Group 3 who joint during 2016 leave this month  0  2  1 
Total number in month end       12  13  17 

回答

0

我要去帮助需要的输出的每个部分,因为我不相信这是一个把这种格式的所有数据放在一个单一的数据框中是个好主意。如果您确实需要这种格式,我可以编辑答案。

要计算来自不同群体的人数和参加年离开的时候,你可以使用dplyrtidyr封装的组合如下:

library(dplyr) 
library(tidyr) 
question_table %>% 
    filter(leave_time != '.') %>% 
    mutate(Joint_year = substr(Joint_time, 1, 4)) %>% 
    group_by(group, leave_time, Joint_year) %>% 
    summarise(left = n()) %>% 
    spread(leave_time, left, fill = 0) 

返回的输出如下:

Source: local data frame [4 x 5] 
Groups: group [2] 

    group Joint_year 201601 201602 201603 
    (dbl)  (chr) (dbl) (dbl) (dbl) 
1  2  2015  2  1  1 
2  2  2016  0  1  1 
3  3  2015  1  1  0 
4  3  2016  0  2  1 

总之,在2016年的每个月中加入了多少人,你可以这样做:

question_table %>% 
    filter(Joint_time %in% c('201601', '201602', '201603')) %>% 
    group_by(Joint_time) %>% 
    summarise(joined = n()) %>% 
    spread(Joint_time, joined, fill = 0) 

Source: local data frame [1 x 3] 

    201601 201602 201603 
    (dbl) (dbl) (dbl) 
1  5  6  7 

在这种情况下,最好在最后避免spread并保留长格式的数据。但是,这取决于你。

至于在每个周期的开始让客户总数的最后一部分,你可以做这样的事情:

question_table$Joint_time <- as.character(question_table$Joint_time) 
question_table$leave_time <- as.character(question_table$leave_time) 

df <- data.frame(numberBeginning = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time < x, leave_time == '.' | leave_time >= x)))) 

如果你想在宽幅的最后一个,它需要更多一些工作:

df$period <- row.names(df) 
row.names(df) <- NULL 
df <- spread(df, period, numberBeginning) 

    201601 201602 201603 
1  10  12  13 

可以稍微修改上面的代码来获得信息的最后一点上结束数如下:

df <- data.frame(numberEnding = sapply(sort(unique(question_table$leave_time[question_table$leave_time != '.'])), function(x) nrow(filter(question_table, Joint_time <= x, leave_time == '.' | leave_time > x)))) 
df$period <- row.names(df) 
row.names(df) <- NULL 
df <- spread(df, period, numberEnding) 
df 
    201601 201602 201603 
1  12  13  17 
+0

非常感谢。我怎样才能将所有四个“df”垂直合并到一张表中? –

+0

当然,使用'rbind'可能会有所帮助。只是我不喜欢在同一个数据框中保存不同的数据。您可能需要保存为四个名称,而不是'df',如上面所用。而且,如果符合您的需求,也许您可​​以投票并接受答案。 – Gopala

+0

运行'df <-data.frame()'代码后,它显示“'> ='对于因素没有意义”。如何解决这个问题呢?谢谢。 –