2016-02-10 58 views
0

要创建一些图,我已经使用以下方法汇总了我的数据,其中包括所有需要的信息。取决于矢量的订单清单

# Load Data 
RawDataSet <- read.csv("http://pastebin.com/raw/VP6cF31A", sep=";") 
# Load packages 
library(plyr) 
library(dplyr) 
library(tidyr) 
library(ggplot2) 
library(reshape2) 

# summarising the data 
new.df <- RawDataSet %>% 
    group_by(UserEmail,location,context) %>% 
    tally() %>% 
    mutate(n2 = n * c(1,-1)[(location=="NOT_WITHIN")+1L]) %>% 
    group_by(UserEmail,location) %>% 
    mutate(p = c(1,-1)[(location=="NOT_WITHIN")+1L] * n/sum(n)) 

通过一些其他分析,我确定了不同的用户组。由于我想绘制我的数据,因此将绘图以正确的顺序显示我的数据会很棒。 的顺序是根据USEREMAIL和由以下定义:

order <- c("28","27","25","23","22","21","20","16","12","10","9","8","5","4","2","1","29","19","17","15","14","13","7","3","30","26","24","18","11","6") 

问我new.dftypeof(new.df)的类型,它说,这是一个list。我已经尝试了一些方法,如order_by或with_order,但我直到现在我还没有设法订购我的new.df,这取决于我的order -vector。当然,订单流程也可以在汇总部分完成。 有没有办法做到这一点?

+1

只是'dplyr :: arrange'。 'typeof' data.frame是一个列表(它在技术上是); 'class'告诉你它是否实际上是一个'data.frame'。 – alistaire

回答

2

我无法自己创建一个名为order的向量,不以此名称来尊重R函数。使用match构建的指数为基础order ING使用(如函数):

sorted.df <- new.df[ order(match(new.df$UserEmail, as.integer(c("28","27","25","23","22","21","20","16","12","10","9","8","5","4","2","1","29","19","17","15","14","13","7","3","30","26","24","18","11","6")))), ] 
head(sorted.df) 
#--------------- 
Source: local data frame [6 x 6] 
Groups: UserEmail, location [4] 

    UserEmail location context  n n2   p 
     (int)  (fctr) (fctr) (int) (dbl)  (dbl) 
1  28 NOT_WITHIN Clicked A 16 -16 -0.8421053 
2  28 NOT_WITHIN Clicked B  3 -3 -0.1578947 
3  28  WITHIN Clicked A  2  2 1.0000000 
4  27 NOT_WITHIN Clicked A  4 -4 -0.8000000 
5  27 NOT_WITHIN Clicked B  1 -1 -0.2000000 
6  27  WITHIN Clicked A  1  1 1.0000000 

(我没有加载plyr或reshape2因为这些包中的至少一个具有相互作用的坏习惯不好用dplyr函数)。

+0

谢谢:)工作就像一个魅力。不幸的是,我遇到了另一个问题,它涉及到这个问题,但是这是关于ggplot的问题.... http://stackoverflow.com/questions/35324848/reorder-data-in-ggplot-after-successfully -reorder-底层数据 – schlomm