2015-05-29 53 views
1

我试图重写这段代码(学习这种做法),用%>%操作:重写代码%>%操作

library(arules) 
data(AdultUCI) #https://archive.ics.uci.edu/ml/datasets/Census+Income 

AdultUCI[["capital-gain"]] <- ordered(cut(AdultUCI[["capital-gain"]], 
+ c(-Inf, 0, median(AdultUCI[["capital-gain"]][AdultUCI 
+ [["capital-gain"]] > 0]), Inf)), 
+ labels = c("None", "Low", "High")) 

是否有可能呢?这里是我的尝试:

AdultUCI[["capital-gain"]] <- ordered %>% cut %>% AdultUCI[["capital-gain"]], 
          + c(-Inf, 0, median(AdultUCI[["capital-gain"]][AdultUCI[["capital-gain"]] > 0]), 
          + Inf),labels = c("None", "Low", "High") 
+5

请让你的代码** [重复性(http://stackoverflow.com/a/28481250/2725969)**。 – BrodieG

+1

一般而言,您几乎总是可以用管道运算符替换嵌套函数。你有什么尝试?它没有工作?有什么问题? – Molx

+0

@Molx我解决这个漫长的操作有问题。订单是正确的?大部分%>%? – Kulis

回答

1

这应该工作:

library(dplyr) 

#reproducible data 
AdultUCI <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=FALSE) 
colnames(AdultUCI)[13] <- "capital-gain" 

#original code 
originalOrdered <- 
    ordered(cut(AdultUCI[["capital-gain"]], 
       c(-Inf, 0, 
       median(AdultUCI[["capital-gain"]][AdultUCI[["capital-gain"]] > 0]), Inf), 
       labels = c("None", "Low", "High")), 
      levels = c("None", "Low", "High")) 

#using dplyr 
newOrdered <- 
    AdultUCI %>% 
    select(x=`capital-gain`) %>% 
    mutate(capitalGainOrdered= 
      ordered(
      cut(x,c(-Inf, 0, median(x[x > 0]), Inf), 
       labels = c("None", "Low", "High")), 
      levels = c("None", "Low", "High"))) %>% 
    .$capitalGainOrdered 


#test if same 
identical(originalOrdered,newOrdered) 
#[1] TRUE 

str(newOrdered) 
#Ord.factor w/ 3 levels "None"<"Low"<"High": 2 2 2 2 2 2 2 3 3 2 ...