2015-08-27 44 views
9

是否有一种简单的方法来拆分保留分隔符的字符串? 取而代之的是:拆分保持分隔符的字符串

let texte = "Ten. Million. Questions. Let's celebrate all we've done together."; 
let v: Vec<&str> = texte.split(|c: char| !(c.is_alphanumeric() || c == '\'')).filter(|s| !s.is_empty()).collect(); 

["Ten", "Million", "Questions", "Let's", "celebrate", "all", "we've", "done", "together"]结果。

我想的东西,给我:

["Ten", ".", " ", "Million", ".", " ", "Questions", ".", " ", "Let's", " ", "celebrate", " ", "all", " ", "we've", " ", "done", " ", "together", "."]

我想那样的代码(它假定字符串开头一个字母和一个“non'字母结尾):

let texte = "Ten. Million. Questions. Let's celebrate all we've done together. "; 
let v1: Vec<&str> = texte.split(|c: char| !(c.is_alphanumeric() || c == '\'')).filter(|s| !s.is_empty()).collect(); 
let v2: Vec<&str> = texte.split(|c: char| c.is_alphanumeric() || c == '\'').filter(|s| !s.is_empty()).collect(); 
let mut w: Vec<&str> = Vec::new(); 

let mut j = 0; 
for i in v2 { 
    w.push(v1[j]); 
    w.push(i); 
    j = j+1; 
} 

这让我几乎我前面写的结果,但它的好:

["Ten", ". ", "Million", ". ", "Questions", ". ", "Let's", " ", "celebrate", " ", "all", " ", "we've", " ", "done", " ", "together", "."] 

但是有没有更好的方法来编码?因为我试图枚举v2,但它不起作用,并且在for循环中使用j看起来很粗糙。

+1

正则表达式是你所需要的。 – Onilol

回答

3

我没能在标准库中发现了什么,所以I wrote my own

这个版本使用不稳定的模式API,因为它是更灵活,但上面的链接有,我已经硬编码为我的具体回退稳定的用例。

#![feature(pattern)] 

use std::str::pattern::{Pattern,Searcher}; 

#[derive(Copy,Clone,Debug,PartialEq)] 
pub enum SplitType<'a> { 
    Match(&'a str), 
    Delimiter(&'a str), 
} 

pub struct SplitKeepingDelimiter<'p, P> 
    where P: Pattern<'p> 
{ 
    searcher: P::Searcher, 
    start: usize, 
    saved: Option<usize>, 
} 

impl<'p, P> Iterator for SplitKeepingDelimiter<'p, P> 
    where P: Pattern<'p>, 
{ 
    type Item = SplitType<'p>; 

    fn next(&mut self) -> Option<SplitType<'p>> { 
     if self.start == self.searcher.haystack().len() { 
      return None; 
     } 

     if let Some(end_of_match) = self.saved.take() { 
      let s = &self.searcher.haystack()[self.start..end_of_match]; 
      self.start = end_of_match; 
      return Some(SplitType::Delimiter(s)); 
     } 

     match self.searcher.next_match() { 
      Some((start, end)) => { 
       if self.start == start { 
        let s = &self.searcher.haystack()[start..end]; 
        self.start = end; 
        Some(SplitType::Delimiter(s)) 
       } else { 
        let s = &self.searcher.haystack()[self.start..start]; 
        self.start = start; 
        self.saved = Some(end); 
        Some(SplitType::Match(s)) 
       } 
      }, 
      None => { 
       let s = &self.searcher.haystack()[self.start..]; 
       self.start = self.searcher.haystack().len(); 
       Some(SplitType::Match(s)) 
      }, 
     } 
    } 
} 

pub trait SplitKeepingDelimiterExt: ::std::ops::Index<::std::ops::RangeFull, Output = str> { 
    fn split_keeping_delimiter<P>(&self, pattern: P) -> SplitKeepingDelimiter<P> 
     where P: for <'a> Pattern<'a> 
    { 
     SplitKeepingDelimiter { searcher: pattern.into_searcher(&self[..]), start: 0, saved: None } 
    } 
} 

impl SplitKeepingDelimiterExt for str {} 

#[cfg(test)] 
mod test { 
    use super::{SplitKeepingDelimiterExt}; 

    #[test] 
    fn split_with_delimiter() { 
     use super::SplitType::*; 
     let delims = &[',', ';'][..]; 
     let items: Vec<_> = "alpha,beta;gamma".split_keeping_delimiter(delims).collect(); 
     assert_eq!(&items, &[Match("alpha"), Delimiter(","), Match("beta"), Delimiter(";"), Match("gamma")]); 
    } 

    #[test] 
    fn split_with_delimiter_allows_consecutive_delimiters() { 
     use super::SplitType::*; 
     let delims = &[',', ';'][..]; 
     let items: Vec<_> = ",;".split_keeping_delimiter(delims).collect(); 
     assert_eq!(&items, &[Delimiter(","), Delimiter(";")]); 
    } 
} 

你会注意到,我需要跟踪,如果事情是分隔符与否的一个,但应该是很容易适应,如果你不需要它。

+0

哇,我需要了解更多关于Rust的内容才能理解代码。然而,我认为在分割字符串两次之后,就会得到单词,然后是相反的模式。你对我的新代码有什么看法? – Keho

+1

当['str :: match_indices'](http://doc.rust-lang.org/nightly/std/primitive.str.html#method.match_indices)稳定时,这会更简单。 – bluss

3

使用str::match_indices

let text = "Ten. Million. Questions. Let's celebrate all we've done together."; 

let mut result = Vec::new(); 
let mut last = 0; 
for (index, matched) in text.match_indices(|c: char| !(c.is_alphanumeric() || c == '\'')) { 
    if last != index { 
     result.push(&text[last..index]); 
    } 
    result.push(matched); 
    last = index + matched.len(); 
} 
if last < text.len() { 
    result.push(&text[last..]); 
} 

println!("{:?}", result); 

打印:

["Ten", ".", " ", "Million", ".", " ", "Questions", ".", " ", "Let\'s", " ", "celebrate", " ", "all", " ", "we\'ve", " ", "done", " ", "together", "."]

相关问题