2017-07-19 50 views
0

我有这个数据。在R中创建一个具有一定条件的列

OPENING CLOSE 
2007  2008 
2009  2010  
2004  NA 

,我想作此列

OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 
2007  2008      1  1 
2005  2008   1  1  1  1         
2004  NA 1  1  1  1  1  1  1 

它可以创建此列逐步与IF函数,而我想打循环或lapply功能。

此外,我想使这个列(S〜)使用一定的条件。

如果一列(Y2007)为1和列3年前为1(Y2005), 新列(S2007)为1,否则为0

OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 | S2007 S2008 S2009 
2007  2008      1  1    | 0  0  0 
2005  2008   1  1  1  1    | 1  1  0 
2004  NA 1  1  1  1  1  1  1 | 1  1  1 

如何使脚本在R?

回答

1

来自tidyverse的解决方案。 dt3是第一个需要的输出,而dt5是第二个需要的输出。这里不需要使用loops

# Create example data frame 
dt <- read.table(text = "OPENING CLOSE 
2007  2008 
       2005  2008  
       2004  NA ", 
       header = TRUE, stringsAsFactors = FALSE) 

# Load package 
library(tidyverse) 

dt2 <- dt %>% 
    mutate(ID = 1:n(), EndYear = ifelse(is.na(CLOSE), 2010, CLOSE)) %>% 
    # Create year range list 
    mutate(YearRange = map2(OPENING, EndYear, `:`)) %>% 
    # Unnest the list column 
    unnest() %>% 
    mutate(YearRange = paste0("Y", YearRange)) %>% 
    mutate(Value = 1) %>% 
    # Spread based on YearRange and Value 
    spread(YearRange, Value) 

# Desired output 1 
dt3 <- dt2 %>% 
    arrange(ID) %>% 
    select(-ID, -EndYear) 

dt4 <- dt2 %>% 
    gather(YearRange, Value, Y2004:Y2010) %>% 
    arrange(ID) %>% 
    group_by(ID) %>% 
    # Set the lag year here, using 3 years ago as an example 
    mutate(Value2 = lag(Value, 2)) %>% 
    # Evaluate the condition bewteen one year and 3 years ago 
    mutate(Value3 = ifelse(Value %in% 1 & Value2 %in% 1, 1, 0)) %>% 
    mutate(YearRange = sub("Y", "S", YearRange)) %>% 
    select(ID, YearRange, Value3) %>% 
    # Filter for S2007 o S2009 
    filter(YearRange %in% paste0("S", 2007:2009)) %>% 
    spread(YearRange, Value3) 

# Desired output 2 
dt5 <- dt2 %>% 
    left_join(dt4, by = "ID") %>% 
    arrange(ID) %>% 
    select(-ID, -EndYear) 
+0

这是我伟大的答案之一,因为我看到了!谢谢。啊!我想在代码下面设置额外的过滤器(filter(YearRange%in%〜))。我想做Y2007〜Y2009的子集,以便我编写代码下面的代码,但它不起作用。如何修改代码以使用两种方式进行过滤? –

0

基础R版本:

rng <- range(unlist(dat), na.rm=TRUE) 
rng <- rng[1]:rng[2] 

dat[paste0("Y",rng)] <- t(mapply(
    function(op,cl,rn) rn >= op & (rn <= cl | is.na(cl)), 
    dat[["OPENING"]], 
    dat[["CLOSE"]], 
    list(rng) 
)) 

# OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 
#1 2007 2008 FALSE FALSE FALSE TRUE TRUE FALSE FALSE 
#2 2009 2010 FALSE FALSE FALSE FALSE FALSE TRUE TRUE 
#3 2004 NA TRUE TRUE TRUE TRUE TRUE TRUE TRUE 
相关问题