2017-06-04 41 views
1
mydata <- structure(list(id = 1:10, cafe = c(0, 1, 0, 0, 1, 1, 0, 0, 1, 
1), playground = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 0), classroom = c(0, 
0, 0, 0, 0, 1, 1, 1, 1, 1), gender = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor"), 
    job = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("Student", 
    "Teacher"), class = "factor")), .Names = c("id", "cafe", 
"playground", "classroom", "gender", "job"), row.names = c(NA, 
-10L), class = "data.frame") 

> mydata 
    id cafe playground classroom gender  job 
1 1 0   1   0 Male Teacher 
2 2 1   1   0 Male Student 
3 3 0   1   0 Male Teacher 
4 4 0   1   0 Male Student 
5 5 1   1   0 Male Teacher 
6 6 1   1   1 Male Teacher 
7 7 0   0   1 Female Teacher 
8 8 0   1   1 Male Teacher 
9 9 1   1   1 Female Teacher 
10 10 1   0   1 Male Student 

我希望的长格式的数据集应该是这样的:R:转换二元分类变量的长期数据格式

id  response gender  job 
1  playground  Male Teacher 
2   cafe  Male Student 
2  playground  Male Student 
3  playground  Male Teacher 
... 

从本质上讲,response列对应于网吧,运动场,教室列有一个值1.我已经看过几个例子herehere,但它们不处理二进制数据列。

回答

1

我们可以使用带有做到这一点tidyverse

library(tidyverse) 
mydata %>% 
    gather(response, value, cafe:classroom) %>% 
    filter(value==1) %>% 
    select(id, response, gender, job) 
0

这可以通过使用reshape包中的melt(data, ...)函数来完成。

library(reshape) 

首先,我们将要保留的变量指定为列。

id <- c("id", "gender", "job") 

然后,我们改变了宽幅长格式,只保留包含1行。

df <- melt(mydata, id=id) 
df[df[,5]==1,-5] 

然后,通过id订购数据。

df <- df[order(df[,"id"]),] 

最后,我们更改列名并重新排列列。

colnames(df)[4] <- "response" 
df <- df[,c(1,4,2,3)] 

## id response gender job 
## 1 playground Male Teacher 
## 2  cafe Male Student 
## 2 playground Male Student 
## 3 playground Male Teacher 
## ... 
## ... 
## 9 classroom Female Teacher 
## 10  cafe Male Student 
## 10 classroom Male Student