的Python：在空间分割，除了某些字符

之间我解析有这样的诗句为的Python：在空间分割，除了某些字符

 
type("book") title("golden apples") pages(10-35 70 200-234) comments("good read")

一个文件，我想这个分成不同的字段。

在我的示例中，有四个字段：类型，标题，页面和注释。

分割后期望的结果是

 
['type("book")', 'title("golden apples")', 'pages(10-35 70 200-234)', 'comments("good read")]

很显然，一个简单的字符串分割将无法正常工作，因为它会在每一个空间分割刚。我想分割空格，但在括号和引号之间保留任何内容。

我该如何分割？

来源

2012-03-10 MxyL

此正则表达式应该为你工作\s+(?=[^()]*(?:\(|$))

result = re.split(r"\s+(?=[^()]*(?:\(|$))", subject)

说明

r""" 
\s    # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) 
    +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
(?=   # Assert that the regex below can be matched, starting at this position (positive lookahead) 
    [^()]   # Match a single character NOT present in the list “()” 
     *    # Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
    (?:    # Match the regular expression below 
        # Match either the regular expression below (attempting the next alternative only if this one fails) 
     \(   # Match the character “(” literally 
     |    # Or match regular expression number 2 below (the entire group fails if this one fails to match) 
     $    # Assert position at the end of a line (at the end of the string or before a line break character) 
    ) 
) 
"""

来源

2012-03-10 07:43:25

不错，虽然它似乎在返回的列表中添加了一些额外的括号（我不知道它们来自哪里）。我使用py3。 – MxyL 2012-03-10 07:48:20

试试这个：'re.split（r“\ s +（？= [^（）] *（？：\（| $））”，subject）' – San4ez 2012-03-10 07:50:06

@Keikoku修正了它，这是因为捕获组。 – 2012-03-10 07:51:13

我会尝试使用正向后看断言。

r'(?<=\))\s+'

实施例：

>>> import re 
>>> result = re.split(r'(?<=\))\s+', 'type("book") title("golden apples") pages(10-35 70 200-234) comments("good read")') 
>>> result 
['type("book")', 'title("golden apples")', 'pages(10-35 70 200-234)', 'comments(
"good read")']

来源

2012-03-10 07:51:04 dave

如果输入文本中没有括号，例如'test test test'，那么将不起作用 – 2012-03-10 07:58:53

问题已经定义了格式，test test test不是可能的。 – dave 2012-03-10 14:33:10

拆分上") "并添加)回到每个元件除了最后。

来源

2012-03-10 07:53:54

的Python：在空间分割，除了某些字符

回答

相关问题