2011-01-24 105 views
10

我需要分割一个字符串,如下所示,基于空格作为分隔符。但是报价中的任何空间都应该保留。正则表达式分割字符串保留引号

research library "not available" author:"Bernard Shaw" 

research 
library 
"not available" 
author:"Bernard Shaw" 

我试图做这在C夏普,我有这样的正则表达式:@"(?<="")|\w[\w\s]*(?="")|\w+|""[\w\s]*"""从另一篇文章中SO,其将字符串转换成

research 
library 
"not available" 
author 
"Bernard Shaw" 

这不幸的是不符合我的确切要求。

我正在寻找任何正则表达式,这将做的伎俩。

任何帮助表示赞赏。

回答

25

只要有可能没有逃脱引号引用的字符串中,以下应该工作:

splitArray = Regex.Split(subjectString, "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*) (?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"); 

上的空格字符此正则表达式分裂,只有当他们被偶数报价的之前和之后。

没有所有这些正则表达式转义引号,解释说:

(?<=  # Assert that it's possible to match this before the current position (positive lookbehind): 
^  # The start of the string 
[^"]* # Any number of non-quote characters 
(?:  # Match the following group... 
    "[^"]* # a quote, followed by any number of non-quote characters 
    "[^"]* # the same 
)*  # ...zero or more times (so 0, 2, 4, ... quotes will match) 
)   # End of lookbehind assertion. 
[ ]  # Match a space 
(?=  # Assert that it's possible to match this after the current position (positive lookahead): 
(?:  # Match the following group... 
    [^"]*" # see above 
    [^"]*" # see above 
)*  # ...zero or more times. 
[^"]* # Match any number of non-quote characters 
$  # Match the end of the string 
)   # End of lookahead assertion 
+0

如何分割它带点,问号,感叹号等等而不是空格。除了引号内容外,我试图逐句读出每个句子。例如:走了。 **回头了**但是为什么? **并说:“你好,世界,该死的弦分裂的东西!”没有耻辱** – ErTR 2016-01-26 00:25:21

+1

@ErtürkÖztürk:这是值得它自己的StackOverflow问题 - 太大而无法在评论中回答。 – 2016-01-26 07:12:10

3

在这里你去:

C#:

Regex.Matches(subject, @"([^\s]*""[^""]+""[^\s]*)|\w+") 

正则表达式:

([^\s]*\"[^\"]+\"[^\s]*)|\w+ 
相关问题