python正则表达式找到匹配的字符串

我想在Python中使用正则表达式在字符串中找到匹配的字符串。该string看起来是这样的：python正则表达式找到匹配的字符串

band 1 # energy -53.15719532 # occ. 2.00000000 

ion  s  p  d tot 
    1 0.000 0.995 0.000 0.995 
    2 0.000 0.000 0.000 0.000 
tot 0.000 0.996 0.000 0.996 

band 2 # energy -53.15719532 # occ. 2.00000000 

ion  s  p  d tot 
    1 0.000 0.995 0.000 0.995 
    2 0.000 0.000 0.000 0.000 
tot 0.000 0.996 0.000 0.996 

band 3 # energy -53.15719532 # occ. 2.00000000

我的目标是要找到tot后的字符串。所以匹配的字符串会是这样的：

['0.000 0.996 0.000 0.996', 
'0.000 0.996 0.000 0.996']

这里是我当前的代码：

pattern = re.compile(r'tot\s+(.*?)\n', re.DOTALL) 
pattern.findall(string)

但是，输出给了我：

['1 0.000 0.995 0.000 0.995', 
'0.000 0.996 0.000 0.996', 
'1 0.000 0.995 0.000 0.995', 
'0.000 0.996 0.000 0.996']

的我在做什么任何想法错误？

来源

2016-09-04 Jianli Cheng

您不希望DOTALL标志。将其移除并改为使用MULTILINE。

pattern = re.compile(r'^\s*tot(.*)', re.MULTILINE)

这匹配所有以tot开头的行。该行的其余部分将在组1

援引documentation，重点煤矿：

re.DOTALL

充分利用'.'特殊的任何字符都匹配，包括换行符 ;没有这个标志，'.'将会匹配任何东西，除了换行符。

请注意，你可以很容易地做到这一点，而无需正则表达式。

with open("input.txt", "r") as data_file: 
    for line in data_file: 
     items = filter(None, line.split(" ")) 
     if items[0] == "tot": 
      # etc

来源

2016-09-04 18:02:41 Tomalak

这解决了我的问题。我想我对“DOTALL”和“MUTILINE”感到困惑。需要阅读更多关于它的信息。 –

您正在使用re.DOTALL，表示点“。”。会发现如下直到下一个换行符都“TOT” -s，一切都匹配任何东西，甚至换行，在本质：

      tot 
    1 0.000 0.995 0.000 0.995

和

tot 0.000 0.996 0.000 0.996

删除re.DOTALL应该解决您的问题。

编辑：实际上，DOTALL标志并不是真正的问题（虽然不必要）。模式中的问题是\ s +匹配换行符。更换与一个单一的空间解决了这个问题：

pattern = re.compile(r'tot (.*?)\n')

来源

2016-09-04 18:06:42 mpurg

我认为我应该将'DOTALL'更改为'MULTILINE'，因为@Tomalak建议 –

这里不需要MULTILINE，除非您想分别使用^和$来匹配行的开头和结尾。我必须指出，@ Tomalak的解决方案更清洁。 – mpurg

你是对的。 's +'实际上是这里的问题。我虽然只意味着不止一个空格。谢谢你让我知道。 –

使用re.findall功能与特定的正则表达式模式中的另一种解决方案：

# str is your inital string 
result = re.findall('tot [0-9 .]+(?=\n|$)', str) 
print(result)

输出：

['tot 0.000 0.996 0.000 0.996', 'tot 0.000 0.996 0.000 0.996']

来源

2016-09-04 18:09:04 RomanPerekhrest

python正则表达式找到匹配的字符串

回答

相关问题