2013-03-07 54 views
1

我想编译一个正则表达式,以便能够从推文中累积一系列标签(r'#\w+')。我希望能够编写两个正则表达式,这些正则表达式可以从推文的开始和结束中做到这一点。我使用python 272,我的代码是这样的。Python re.compile。不平衡括号错误

HASHTAG_SEQ_REGEX_PATTERN   = r""" 
(          #Outermost grouping to match overall regex 
#\w+         #The hashtag matching. It's a valid combination of \w+ 
([:\s,]*#\w+)*       #This is an optional (0 or more) sequence of hashtags separated by [\s,:]* 
)          #Closing parenthesis of outermost grouping to match overall regex 
""" 

LEFT_HASHTAG_REGEX_SEQ  = re.compile('^' + HASHTAG_SEQ_REGEX_PATTERN , re.VERBOSE | re.IGNORECASE) 

当执行在那里我编译正则表达式行,我得到以下错误:

sre_constants.error: unbalanced parenthesis 

我不知道为什么我会收到这个,因为没有不平衡的括弧我可以看到我的正则表达式模式。

回答

5

此行是第一#之后注释掉:

 v----comment starts here 
([:\s,]*#\w+)* ... 

逃避它:

([:\s,]*\#\w+)* 

此行过,但它不会引起不平衡括号:)

v----escape me 
#\w+         #The hashtag matching ... 

 

HASHTAG_SEQ_REGEX_PATTERN   = r""" 
(    # Outermost grouping to match overall regex 
\#\w+    # The hashtag matching. It's a valid combination of \w+ 
([:\s,]*\#\w+)* # This is an optional (0 or more) sequence of hashtags separated by [\s,:]* 
)     # Closing parenthesis of outermost grouping to match overall regex 
""" 
+0

我怎么能这么非常非常愚蠢!感谢帕维尔感谢爆炸为您的答案。 – VaidAbhishek 2013-03-07 22:20:57

3

你有一些转义哈希那里,你要合法使用,但VERBOSE被拧您:

\#\w+ 
([:\s,]*\#\w+)* #reported issue caused by this hash 
0

或者,使用[#]#标志添加到其不打算正则表达式开始评论:

HASHTAG_SEQ_REGEX_PATTERN   = r""" 
(     #Outermost grouping to match overall regex 
[#]\w+    #The hashtag matching. It's a valid combination of \w+ 
([:\s,]*[#]\w+)*  #This is an optional (0 or more) sequence of hashtags separated by [\s,:]* 
)     #Closing parenthesis of outermost grouping to match overall regex 
""" 

我觉得这样更具可读性。

2

,如果你写的图案folows你不会有这样的问题:

HASHTAG_SEQ_REGEX_PATTERN = (
'(' #Outermost grouping to match overall regex 
'#\w+'  #The hashtag matching. It's a valid combination of \w+ 
'([:\s,]*#\w+)*' #This is an optional (0 or more) sequence of hashtags separated by [\s,:]* 
')' #Closing parenthesis of outermost grouping to match overall regex 
) 

就个人而言,我从来没有使用re.VERBOSE,我从来没有提醒有关空白和其他

规则