2017-05-04 47 views
0
library(dplyr) devel version, soon-to-be released 0.6 
library(tidyr) 

下面是一个简单的数据集。 Q1Sat-Q3Sat变量是满意度水平,Q1Used-Q3Used变量指的是调查对象是否使用了他们所评定的项目。问卷在调查中一起提出。实际上,真实数据集至少包含50个Sat变量和Used变量。使用Devel版本的Dplyr范围过滤器进行条件过滤

Q1Sat<-c("Neutral","Neutral","VSat","Sat","Neutral","Sat","VDis","Sat","Sat","VSat") 
Q2Sat<-c("Neutral","VSat","Dis","Dis","VDis","Sat","Sat","VSat","Neutral","Dis") 
Q3Sat<-c("Sat","Sat","Diss","Neutral","VSat","VDis","Sat","Sat","Sat","Neutral") 
Q3Used<-c("Yes","No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No") 
Q2Used<-c("Yes","Yes","Yes","Yes","No","No","Yes","Yes","Yes","Yes") 
Q1Used<-c("Yes","Yes","Yes","No","No","Yes","Yes","Yes","No","Yes") 
House<-c("Yes","No","Unsure","Yes","Yes","No","Unsure","Unsure","Yes","Yes") 

Test<-data_frame(Q1Sat,Q2Sat,Q3Sat,Q1Used,Q2Used,Q3Used,House) 

我想用下面的代码将数据重新组织到一个百分比表中。但是,我需要过滤q1Used - q3Used变量以仅包含“是”,而House变量仅包含“是”。正如已经提到的那样,q1Sat与q1Used绑定,所以q1Sat应该只包含在q1Used为“是,而House变量为”是“的情况下,我需要对q2Sat和q3Sat执行此操作。我尝试使用dplyr的devel版本的范围过滤器,但我不知道如何使用它与多个变量 - q1Used:q3Used,以及众议院

那么,我将如何添加过滤器房子!=“是”在下面的代码的作用域过滤器?不devel的版本

Test%>% 
filter_at(vars(Q1Used:Q35Used),all_vars(. != 1)%>% 
select(Q1Sat:Q3Sat)%>% 
gather()%>% 
count(key,value)%>% 
mutate(perc=round(n/sum(n),2))%>% 
select(-n)%>% 
spread(value,perc) 
+0

如果您只选择了'星期六'变量,您如何获得'过滤器''已使用'变量?此外,根据您的情况('q1Used - q3使用变量只包含“是”,而House变量只包含“否”'),过滤后将有0行,因为没有行满足条件 – akrun

+0

我想我应该在select中包含“Used”变量,然后......这也是问题的一部分,我只是希望找到一种更简单的方法来使用管道和tidyverse编写上面的代码。至于没有满足条件的行,我将“House”变量从no更改为yes。这真的没关系,学习如何在不同类型的变量上共同使用作用域过滤器... – Mike

+0

我编辑了代码......它现在应该会更好吗? – Mike

回答

0

解决方案。总的想法是,我们在不需要重新编码值NA代替过滤。

sat = Test %>% select(Q1Sat:Q3Sat, House) %>% 
     gather(key_sat, Sat, -House) 
used = Test %>% select(Q1Used:Q3Used) %>% 
    gather(key_used, Used) 

cbind(used, sat) %>% 
    group_by(key_sat) %>% 
    mutate(
     value = ((Used != "No") & (House == "Yes")) * 1, 
     base = sum(value) 
    ) %>% 
    group_by(key_sat, Sat) %>% 
    summarise(perc = sum(value)/sum(base[1])) %>% 
    spread(Sat,perc)