2017-10-19 95 views
0

我有它的整个条目传播几个月甚至几年的字符串列:查找月份和年份内串

df <- data.frame(STRINGS = c("January 2017 Blah Blah", 
         "February Blah Blah", 
         "2016 Yeah Yeah", 
         "March Bleck", 
         "Stuff")) 

> df 
       STRINGS 
1 January 2017 Blah Blah 
2  February Blah Blah 
3   2016 Yeah Yeah 
4   March Bleck 
5     Stuff 

所有年份的范围从2015年到2017年

我想输出如下:

    STRINGS   MONTH   YEAR 
1 January 2017 Blah Blah   January   2017 
2  February Blah Blah  February   NA 
3   2016 Yeah Yeah    NA   2016 
4   March Bleck   March   NA 
5     Stuff    NA   NA 

这样做的最简单方法是什么?

首先,我有

months <- c("January", "February", "March", "April", "May", "June", 
       "July", "August", "September", "October", "November", "December") 
years <- c(2015, 2016, 2017) 

回答

3

使用dplyrrebus,并stringr溶液。请注意,它假定每行只有1个匹配的月份和年份。

library(dplyr) 
library(rebus) 
library(stringr) 

df2 <- df %>% 
    mutate(STRINGS = as.character(STRINGS)) %>% 
    mutate(MONTH = str_extract(STRINGS, or1(months)), 
     YEAR = str_extract(STRINGS, or1(years))) 
df2 
       STRINGS MONTH YEAR 
1 January 2017 Blah Blah January 2017 
2  February Blah Blah February <NA> 
3   2016 Yeah Yeah  <NA> 2016 
4   March Bleck March <NA> 
5     Stuff  <NA> <NA>