2016-02-29 83 views
3

我需要从具有这些属性值的向量中提取开始年份和结束年份。从字符串和文本数据中提取年份

yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)") 


yr 
June 2013 – Present (2 years 9 months) 
January 2012 – June 2013 (1 year 6 months) 
2006 – Present (10 years) 
2002 – 2006 (4 years) 

我期待这样的输出。有没有人有建议?

start_yr  end_yr 

2013   2016 
2012   2013 
2006   2016 
2002   2006 
+2

gsub与2016年“现在”并提取四位数字。尝试它 – rawr

回答

3
x <- gsub("present", "2016", yr, ignore.case = TRUE) 
x <- regmatches(x, gregexpr("\\d{4}", x)) 
start_yr <- sapply(x, "[[", 1) 
end_yr <- sapply(x, "[[", 2) 

这样可以节省开始一年年底今年2个独立的变量,如果你想让他们在一个只需编辑代码,使Y $ start_yr Y $ end_yr

+0

我有这个东西叫“字符(0)”正在爬行,并得到这个错误“错误在FUN(X [[i]],...):下标越界”。任何关于删除行的建议? – user3570187

0

另一种解决方案是使用在stringr

library(stringr) 
x <- str_replace(yr, "Present", 2016) 
DF <- as.data.frame(str_extract_all(x, "\\d{4}", simplify = T)) 
names(DF) <- c("start_yr", "end_yr") 
DF 

,你会得到

 start_yr end_yr 
1  2013 2016 
2  2012 2013 
3  2006 2016 
4  2002 2006