2012-11-06 110 views
0

我想一个文本文件中的Python的正则表达式匹配

In [44]: with open(path) as f: 
    ....:  for line in f: 
    ....:   matched = re.search('^PARTITION BY HASH',line) 
    ....:   if matched is not None: 
    ....:    print matched.group() 
    ....: 

该文件包含像 分割线BY HASH(SOME_THING)相匹配; 还有一些其他行之间有 SUBPARTITION BY HASH(SOME_THING)不应该匹配

比赛结束后,我想删除该行。 但打印matched.group失败,为什么?

+8

为什么're'在这里? “if”PARTITION BY HASH“in line”或者如果line.startswith(“PARTITION BY HASH”):' –

+0

更新我的问题,为什么我应该使用正则表达式 –

回答

1

是这样的:

In [29]: strs1="PARTITION BY HASH(SOME_THING)" 

In [30]: strs2="SUBPARTITION BY HASH(SOME_THING)" 

In [31]: bool(re.match(r"^PARTITION BY HASH",strs1)) 
Out[31]: True 

In [32]: bool(re.match(r"^PARTITION BY HASH",strs2)) 
Out[32]: False 
0

但打印matched.group失败

那么它根本什么是应该做的事:它返回的比赛。自从

>>> import re 
>>> line = "PARTITION BY HASH(something)" 
>>> re.search('^PARTITION BY HASH', line).group() 
'PARTITION BY HASH' 

如果你想打印基于什么阿什维尼·乔杜里认为,与'PARTITION BY HASH'启动线,这种情况下:

with open(path) as f: 
    for line in f: 
     if line.startswith('PARTITION BY HASH'): 
      print line, 

请注意逗号,以防止打印从插入附加最终行字符。

如果你坚持使用包re

import re 

with open(path) as f: 
    for line in f: 
     if re.match('PARTITION BY HASH', line): 
      print line, 

请注意,re.match没有起始位置指示器^使用(见http://docs.python.org/2/library/re.html#search-vs-match了解更多信息)