2017-08-01 174 views
2

我有以下字符串:蟒蛇分裂

(一些文本)或((其它文本)和(一些文字))和(仍更多的文字)

我想一个python正则表达式,将其分解成

['(some text)', '((other text) and (some more text))', '(still more text)'] 

我已经试过,但它不工作:

haystack = "(some text) or ((other text) and (some more text)) and (still more text)" 
re.split('(or|and)(?![^(]*.\))', haystack) # no worky 

任何帮助表示赞赏。

+5

正则表达式不能很好地处理任意嵌套的内容。除了您向我们展示的示例之外,可能会有更多层嵌套括号。对于这种情况,使用解析器可能会比正则表达式更进一步。 –

+2

这可能有所帮助:https://stackoverflow.com/questions/26633452/how-to-split-by-commas-that-are-not-within-parentheses –

+0

这可能也是有用的:https://stackoverflow.com/questions/4284991/parsing-nested-parentheses-in-python-grab-content-by-level – perigon

回答

1

我会用re.findall代替re.split。而且注意,这只会工作高达深度的括号2

>>> import re 
>>> s = '(some text) or ((other text) and (some more text)) and (still more text)' 
>>> re.findall(r'\((?:\((?:\([^()]*\)|[^()]*)*\)|[^()])*\)', s) 
['(some text)', '((other text) and (some more text))', '(still more text)'] 
>>> 
+0

是的。我添加了一个注释.. –

+0

我试图简化我的字符串,并且它反弹。您的解决方案不适用于我的真实字符串... (substringof('needle',name))或((role eq'needle')and(substringof('needle',email)))或(job eq'needle ')或(office eq'针') –

+0

@ user1571934请提供确切的字符串.. –

0

你可以试试这个 re.split( '[A-F] +', '0a3B9',旗帜= re.IGNORECASE)

2

该解决方案适用于任意嵌套的括号,其中一个正则表达式不能(s是原始字符串):

from pyparsing import nestedExpr 
def lst_to_parens(elt): 
    if isinstance(elt,list): 
     return '(' + ' '.join(lst_to_parens(e) for e in elt) + ')' 
    else: 
     return elt 

split = nestedExpr('(',')').parseString('(' + s + ')').asList() 
split_lists = [elt for elt in split[0] if isinstance(elt,list)] 
print ([lst_to_parens(elt) for elt in split_lists]) 

输出:

['(some text)', '((other text) and (some more text))', '(still more text)'] 

对于OP真实的测试案例:

s = "(substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle')" 

输出:

["(substringof ('needle' ,name))", "((role eq 'needle') and (substringof ('needle' ,email)))", "(job eq 'needle')", "(office eq 'needle')"] 
1

您还可以检查此

import re 
s = '(some text) or ((other text) and (some more text)) and (still more text)' 
find_string = re.findall(r'[(]{2}[a-z\s()]*[)]{2}|[(][a-z\s]*[)]', s) 
print(find_string) 

输出:

['(some text)', '((other text) and (some more text))', '(still more text)'] 

编辑

find_string = re.findall(r'[(\s]{2}[a-z\s()]*[)\s]{2}|[(][a-z\s]*[)]', s) 
+0

这不是匹配括号的正确方法..如果在两个开放括号之间存在任何文本会怎么样? –

+0

@AvinashRaj,请给我一个样本字符串?谢谢。 –

+0

用这个''(一些文本)或((其他文本)和(一些更多的文本))和(更多文本)'字符串检查你的正则表达式。 –