2016-10-02 123 views
1

例如:加快嵌套ifelse语句 - 在我的代码,这点R

time_elapsed      network_name    daypart  day 
1:   4705       Laff TV 2016-09-09 03:11:35 Friday 
2:   1800        CNN 2016-09-10 08:00:00 Saturday 
3:   23        INSP 2016-09-02 18:00:00 Friday 
4:   148        NBC 2016-09-02 16:01:26 Friday 
5:   957     History Channel 2016-09-07 14:44:03 Wednesday 
6:   1138   Nickelodeon/Nick-at-Nite 2016-09-09 16:00:00 Friday 
7:   120      Starz Edge 2016-09-07 15:28:59 Wednesday 
8:   268   Starz Encore Westerns 2016-09-07 17:13:05 Wednesday 
9:   6        CBS 2016-09-10 04:00:00 Saturday 
10:   69      Independent 2016-09-07 12:48:11 Wednesday 
11:   4151        NBC 2016-09-09 04:32:37 Friday 
12:   570 PBS: Public Broadcasting Service 2016-09-07 16:17:58 Wednesday 
13:   1421       NBCSN 2016-09-03 15:22:23 Saturday 
14:   466   Estrella TV (Broadcast) 2016-09-04 19:00:00 Sunday 

(一般超过200万行)

我几个月前写了下面的嵌套ifelse语句时,我运行我的整个脚本经过短短几百万行,但现在我运行它一个更大规模我真的想找到一个办法让它快一点。

targets_random$daypart <- ifelse((wday(targets_random$daypart) == 1 | 
       wday(targets_random$daypart) == 7), "W: Weekend", 
         ifelse(hour(targets_random$daypart) <= 2, "LP: Late Prime", 
         ifelse((hour(targets_random$daypart) >= 3 & 
       hour(targets_random$daypart) <= 5), "O: Overnight", 
         ifelse((hour(targets_random$daypart) >= 6 & 
       hour(targets_random$daypart) <= 9), "EM: Early Morning", 
         ifelse((hour(targets_random$daypart) >= 10 & 
       hour(targets_random$daypart) <= 16), "D: Day", 
         ifelse((hour(targets_random$daypart) >= 17 & 
       hour(targets_random$daypart) <= 20), "F: Fringe", 
         ifelse(hour(targets_random$daypart) >= 21, "P: Prime", NA))))))) 

我试图用一个data.table解决方案,但只有非常稍快,而我的data.table到列表中。对于我的生活,我不明白为什么。这增加了足够的时间来撤消它是不值得的节省。

任何建议将不胜感激。我有什么工作,如果我必须坚持下去,它会没事的。目前大约需要3.5小时才能完成整个代码。最大的部分是SQL查询和结果的文件创建,但如果我能尽可能地减少时间,这将是非常好的!

(一点题外话 - 它使用的是近8小时,然后我更换零件吨,与data.table语法我现在是一个官迷!)

+0

您可能可以使用parLapply一次运行多个行 – Rilcon42

+0

请参阅'?cut'。看来你可以使用类似'切(targets_random $时段每小时$,C(-Inf,3,6,10,17,21,天道酬勤),include.lowest = TRUE,右= FALSE)'但改变“标签”以'C的说法( “LP:已故总理”, “O:隔夜”,等...)'和,之后用'代替 “W:周末”''任何地方(targets_random $时段$ wday + 1)%在%C(1,7)' –

回答

0

考虑建立一个独立的,静态daytimes所有可能组合的数据框及其结果。在SQL实践中,这将被视为查找表。然后定期合并完整的数据表。

# DF (N=168) 7 X 24 
daytimes <- expand.grid(wday=c(1:7), 
         hour=c(1:24))  
daytimes$result <- 
    ifelse((daytimes$wday == 1|daytimes$wday == 7), "W: Weekend", 
     ifelse(daytimes$hour <= 2, "LP: Late Prime", 
      ifelse((daytimes$hour >= 3 & daytimes$hour <= 5), "O: Overnight", 
        ifelse((daytimes$hour >= 6 & daytimes$hour <= 9), "EM: Early Morning", 
          ifelse((daytimes$hour >= 10 & daytimes$hour <= 16), "D: Day", 
            ifelse((daytimes$hour >= 17 & daytimes$hour <= 20), "F: Fringe", 
             ifelse(daytimes$hour >= 21, "P: Prime", NA))))))) 
# CREATE MERGE FIELDS 
targets_random$wday <- wday(targets_random$daypart) 
targets_random$hour <- hour(targets_random$daypart) 

# MERGE WITH NEW COLUMN: result 
targets_random <- merge(targets_random, daytimes, by=c("wday", "hour"))   
+0

天上我要尝试! – Camille