2017-04-08 79 views
0

日期范围的这个向量包含在我的类“字符”的数据框中。该格式取决于日期范围是否跨越到一个不同的月份:将日期范围转换为R中的日期类型

dput(pollingdata$dates) 
c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6", 
"Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3", 
"Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3", 
"Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19", 
"Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26", 
"Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22", 
"Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3", 
"Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13", 
"Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1", 
"Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18", 
"Aug. 10-16", "Jan. 12") 

我想这个向量转换成两个单独列在我的数据帧,1开始日期和结束日期2,在开始和结束的范围内。两栏应保存为'Date'类,这将使我更容易在项目中使用这些数据。有谁知道一个简单的方法来做这个操作?我一直在努力。

由于提前,

回答

2

我们可以通过-分裂载体导入list,通过paste替换具有在端部只有数字元素荷兰国际集团月子,附加NA为那些具有使用小于2组的元素(length<-),并转换为data.frame(与do.call(rbind.data.frame

lst <- lapply(strsplit(v1, "-"), function(x) { 
     i1 <- grepl("^[0-9]+", x[length(x)]) 
     if(i1) { 
      x[length(x)] <- paste(substr(x[1], 1, 4), x[length(x)]) 
      x} else x}) 
d1 <- do.call(rbind.data.frame, lapply(lst, `length<-`, max(lengths(lst)))) 
colnames(d1) <- c("Start_Date", "End_Date") 

按照该OP的帖子,我们需要转换为Date类,但Date类遵循format%Y-%m-%d。在向量中,没有一年,不确定我们可以粘贴当前年份并转换为Date类。如果这是允许的,那么

d1[] <- lapply(d1, function(x) as.Date(paste(x, 2017), "%b. %d %Y")) 
head(d1) 
# Start_Date End_Date 
#1 2017-11-01 2017-11-07 
#2 2017-11-01 2017-11-07 
#3 2017-10-24 2017-11-06 
#4 2017-10-04 2017-11-06 
#5 2017-10-30 2017-11-06 
#6 2017-10-25 2017-10-31 
+0

这个伟大工程,让我钻进去了。这些列不是日期格式,但我可能能得到 – Canovice

+0

@Canvice Date需要年份信息,在您的数据集中它不会显示。如果您可以随意粘贴一年,那么它会转换为“日期”(显示在更新中) – akrun

1

您可以使用库stringr功能“str_split_fixed”分裂字段,然后处理数据。地图图书馆stringr和流程如下:

library(stringr) 
    dat <- data.frame(date=c("Nov. 1-7", "Nov. 1-7", "Oct. 24-Nov. 6", "Oct. 4-Nov. 6", 
       "Oct. 30-Nov. 6", "Oct. 25-31", "Oct. 7-27", "Oct. 21-Nov. 3", 
       "Oct. 20-24", "Jul. 19", "Oct. 29-Nov. 4", "Oct. 28-Nov. 3", 
       "Oct. 27-Nov. 2", "Oct. 20-28", "Sep. 30-Oct. 20", "Oct. 15-19", 
       "Oct. 26-Nov. 1", "Oct. 25-31", "Oct. 24-30", "Oct. 18-26", 
       "Oct. 10-14", "Oct. 4-9", "Sep. 23-Oct. 6", "Sep. 16-29", "Sep. 2-22", 
       "Oct. 21-Nov. 2", "Oct. 17-25", "Sep. 30-Oct. 13", "Sep. 27-Oct. 3", 
       "Sep. 21-26", "Sep. 14-20", "Aug. 26-Sep. 15", "Sep. 7-13", 
       "Aug. 19-Sep. 8", "Aug. 31-Sep. 6", "Aug. 12-Sep. 1", "Aug. 9-Sep. 1", 
       "Aug. 24-30", "Aug. 5-25", "Aug. 17-23", "Jul. 29-Aug. 18", 
       "Aug. 10-16", "Jan. 12")) 

输出处理:

#spliting with space and dash 
dt <- data.frame(str_split_fixed(dat$date, "[-]|\\s",4)) 
names(dt) <- c("stdt1","stdt2","endt1","endt2") 
##Removing dot(.) and replacing with "" 
dt1 <- data.frame(sapply(dt,function(x)gsub("[.]","",x))) 
dt1$stdt <- as.Date(paste0(dt1$stdt2,dt1$stdt1,"2016"),format="%d%b%Y") 
dt1$endt <- ifelse(dt1$endt2=="",paste0(dt1$endt1,dt1$stdt1,"2016"), 
       paste0(dt1$endt2,dt1$endt1,"2016")) 

dt1$endt <-as.Date(ifelse(nchar(dt1$endt)==7,paste0(dt1$stdt2,dt1$endt),dt1$endt),"%d%b%Y") 

假设:

1)没有提供今年,所以我已年2016。

2)第10行和第43行,结束日期“day”没有信息,因此I已假定当天开始日期。

答:

> dt1 
    stdt1 stdt2 endt1 endt2  stdt  endt 
1 Nov  1  7  2016-11-01 2016-11-07 
2 Nov  1  7  2016-11-01 2016-11-07 
3 Oct 24 Nov  6 2016-10-24 2016-11-06 
4 Oct  4 Nov  6 2016-10-04 2016-11-06 
5 Oct 30 Nov  6 2016-10-30 2016-11-06 
6 Oct 25 31  2016-10-25 2016-10-31 
7 Oct  7 27  2016-10-07 2016-10-27 
8 Oct 21 Nov  3 2016-10-21 2016-11-03 
9 Oct 20 24  2016-10-20 2016-10-24 
10 Jul 19    2016-07-19 2016-07-19 
相关问题