2016-12-25 73 views
4

我需要分割一个长字符串。除了它们是日期和时间之外,拆分的地方应该没有什么共同之处。因此,我需要根据特定模式的出现来拆分字符串,即dd/mm/yyyy, hh:mm。虽然我知道函数strsplit和联合字符串操纵器,但它们似乎没有帮助。数据样本如下。如何根据拆分单元的一般格式拆分字符串?

25/06/15, 21:37 - kjadshjabsdjab 
25/06/15, 21:39 - bsadhi2342/342jbjsd 
25/06/15, 21:40 -hkgsad/213/1sadjaa 
25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj 
25/06/15, 21:42 - jkadbsh2:/\sdsadjv 
25/06/15, 21:42 - 

回答

3

我们可以使用正则表达式lookarounds分裂

strsplit(str1, "(?<=[0-9]{2}:[0-9]{2})", perl = TRUE) 

如果我们需要包括 '日期',以及

strsplit(str1, "(?<=[0-9]{2}/[0-9]{2}/[0-9]{2}, [0-9]{2}:[0-9]{2})", perl = TRUE) 

如果我们不想日期时间,然后

setdiff(strsplit(str1, "[0-9]{2}/[0-9]{2}/[0-9]{2}, [0-9]{2}:[0-9]{2}\\s*-\\s*")[[1]], "") 
#[1] "kjadshjabsdjab"    "bsadhi2342/342jbjsd" 
#[3] "hkgsad/213/1sadjaa"   "hsdjhakhjbk12/21s/sda:sdfjbj" 
#[5] "jkadbsh2:/\\sdsadjv" 
1

可以在“ - ”处分割,然后排除最后15个字符。该功能sapply可用于SUBSTR功能应用到列表中的每个项目:

> ss = "25/06/15, 21:37 - kjadshjabsdjab25/06/15, 21:39 - bsadhi2342/342jbjsd25/06/15, 21:40 - hkgsad/213/1sadjaa25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj25/06/15, 21:42 - jkadbsh2:sdsadjv25/06/15, 21:42 -" 
> 
> sapply(strsplit(ss, " - "), function(x) substr(x, 1, nchar(x)-15)) 
    [,1]       
[1,] ""        
[2,] "kjadshjabsdjab"    
[3,] "bsadhi2342/342jbjsd"   
[4,] "hkgsad/213/1sadjaa"   
[5,] "hsdjhakhjbk12/21s/sda:sdfjbj" 
[6,] "jkadbsh2:sdsadjv25"   
2

可以修改正则表达式或突变+副的-的路程,如果不需要的话:

library(stringi) 
library(purrr) 

lines <- readLines(textConnection('25/06/15, 21:37 - kjadshjabsdjab\n25/06/15, 21:39 - bsadhi2342/342jbjsd\n25/06/15, 21:40 -hkgsad/213/1sadjaa\n25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj\n25/06/15, 21:42 - jkadbsh2:/\\sdsadjv\n25/06/15, 21:42 -')) 

stri_match_all_regex(lines, "([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{2}, [[:digit:]]{2}:[[:digit:]]{2})(.*)") %>% 
    map_df(~setNames(as.list(.[,2:3]), c("ts", "string"))) 
## # A tibble: 6 × 2 
##    ts       string 
##    <chr>       <chr> 
## 1 25/06/15, 21:37    - kjadshjabsdjab 
## 2 25/06/15, 21:39   - bsadhi2342/342jbjsd 
## 3 25/06/15, 21:40    -hkgsad/213/1sadjaa 
## 4 25/06/15, 21:41 - hsdjhakhjbk12/21s/sda:sdfjbj 
## 5 25/06/15, 21:42   - jkadbsh2:/\\sdsadjv 
## 6 25/06/15, 21:42        -