2017-08-11 26 views
0

我试图根据行A,x在1年内发生的行来创建一个哑元变量x。 我认为这可能是一个常见问题,并且还有类似的问题已经发布(我发现了this is the most similar)。不幸的是,动物园包不适合,因为它不能很好地处理irregular spaced dates(我不想聚合行,我的数据太大,无法处理这个问题),我一直试图unsuccessfully找出一个数据表方法来做到这一点,虽然我希望根据我的经验总结。为x在未来发生y创建一个指示变量

dates <- rep(as.Date(c('2015-01-01', '2015-02-02', '2015-03-03', '2016-02-02'), '%Y-%m-%d'), 3) 

names <- c(rep('John', 4), rep('Phil', 4), rep('Ty', 4)) 

df <- data.frame(Name = names, Date = dates, 
      did_y = c(0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0), 
      did_x = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1)) 

Name  Date  did_y did_x 
John  2015-01-01 0  1 
John  2015-02-02 1  0 
John  2015-03-03 1  0 
John  2016-02-02 0  0 
Phil  2015-01-01 1  0 
Phil  2015-02-02 1  1 
Phil  2015-03-03 0  1 
Phil  2016-02-02 0  0  
Ty  2015-01-01 0  0 
Ty  2015-02-02 0  0 
Ty  2015-03-03 0  0 
Ty  2016-02-02 0  1 

我想是

dffinal <- data.frame(Name = names, Date = dates, 
        did_y = c(0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0), 
        did_x = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1), 
        did_x_within_year = c(1, 1, 1, NA, 1, 1, 1, 1, 0, 1, 1, 1), 
        did_x_next_year = c(0, 0, 0, NA, 1, 1, 0, NA, 0, 1, 1, NA)) 

Name  Date  did_y did_x did_x_within_year did_x_next_year 
John  2015-01-01 0  1   1     0 
John  2015-02-02 1  0   1     0 
John  2015-03-03 1  0   1     0 
John  2016-02-02 0  0   NA     NA 
Phil  2015-01-01 1  0   1     1 
Phil  2015-02-02 1  1   1     1 
Phil  2015-03-03 0  1   1     0 
Phil  2016-02-02 0  0   1     NA 
Ty  2015-01-01 0  0   0     0 
Ty  2015-02-02 0  0   1     1 
Ty  2015-03-03 0  0   1     1 
Ty  2016-02-02 0  1   1     NA 

所以我想两列,一为当x1年A列内发生(无论之前或之后),而另一个,如果它发生在未来1年内。

我对RcppRoll进行了实验,但它似乎只在日期中向后看,即如果某件事发生在一年之前,它会变成假,但如果将来发生一年,则不会发生。

df$did_x_next_year <- roll_max(df$did_x, 365, fill = NA) 

编辑:基于其他问题的尝试性解决方案

我试图实现this solution(1B),遗憾的是没有在我的数据帧/数据表实际上改变。即使我将该函数作为应用于我的数据时的示例,它也不会更新。

library(zoo) 
library(data.table) 
df$Year <- lubridate::year(df$Date) 
df$Month <- lubridate::month(df$Date) 
df$did_x_next_year <- df$did_x 

DT <- as.data.table(df) 

k <- 12 # prior 12 months 

# inputs zoo object x, subsets it to specified window and sums 
Max2 <- function(x) { 
    w <- window(x, start = end(x) - k/12, end = end(x) - 1/12) 
    if (length(w) == 0 || all(is.na(w))) NA_real_ else max(w, na.rm = TRUE) 
} 

nms <- names(DT)[7] 

setkey(DT, Name, Year, Month) # sort 

# create zoo object from arguments and run rollapplyr using Sum2 
roll2 <- function(x, year, month) { 
    z <- zoo(x, as.yearmon(year + (month - 1)/12)) 
    coredata(rollapplyr(z, k+1, Max2, coredata = FALSE, partial = TRUE)) 
} 

DT <- DT[, nms := lapply(.SD, roll2, Year, Month), .SDcols = nms, by = "Name"] 
+0

行A表示第1行? –

+0

嗯,我分组的数据基于名称列,我正在寻找时间窗口前滚每行,以便计算将向前看,并从每行的日期向后。 – vino88

+0

所以你想要一个滚动平均值或内插? –

回答

0

从一个朋友的建议后,我想出了以下内容:

# Filtering to the obs I care about 
dfadd <- df %>% filter(did_x == 1) %>% select(Name, Date) %>% rename(x_date = Date) 

# Converting to character since in dcast it screws up the dates 
dfadd$x_date <- as.character(dfadd$x_date) 

# Merging data 
df <- plyr::join(df, dfadd, by = 'Name') 

# Creating new column used for dcasting 
df <- df %>% group_by(Name, Date) %>% mutate(x_date_index = seq(from = 1, to = n())) 
df$x_date_index <- paste0('x_date_',df$x_date_index) 

#casting the data wide 
df <- reshape2::dcast(df, 
        Name + Date + did_y + did_x ~ x_date_index, 
        value.var = "x_date", 
        fill = NA) 

# Converting to back to date 
df$x_date_1 <- as.Date(df$x_date_1) 
df$x_date_2 <- as.Date(df$x_date_2) 

# Creating dummy variables 
df$did_x_within_year <- 0 
df$did_x_within_year <- ifelse((df$x_date_1 - df$Date) <= 366, 1, 
df$did_x_within_year) 

df$did_x_next_year <- 0 
df$did_x_next_year <- ifelse(((df$x_date_1 > df$Date) & (df$x_date_1 - df$Date<= 365)), 
         1, df$did_x_next_year) 

# Can extend to account for x_date_2, x_date_3, etc 

# Changing the last entry to NA as desired 
df <- df %>% group_by(Name) %>% mutate(did_x_next_year = c(did_x_next_year[-n()], NA)) 
相关问题