Python找到2个标签之间的字符串

-2

我试图读取存储在文件中的2个标签之间的内容，内容可能跨越多行。标签可以在文件中出现0次或1次。Python找到2个标签之间的字符串

例如：文件的内容可以是

title:Corruption Today: Corruption today in 
content:Corruption Today: 
Corruption today in 
score:0.91750675

所以，在阅读“内容：”我的查询应产生“腐败今天：腐败在今天”。一些谷歌搜索后，我能写出下面，我不知道如何高效上面的代码，因为我们是通过迭代filecontent 2次检索内容的代码

myfile = open(files,'r'); 
filecontent = myfile.read(); 

startPtrs = [m.start()+8 for m in re.finditer('content:', filecontent)]; 
startPtr = startPtrs[0]; 
endPtrs = [m.start()-1 for m in re.finditer('score:', filecontent)]; 
endPtr = endPtrs[0]; 

content = filecontent[startPtr:endPtr];

。可以做更高效的事情吗？

来源

2013-10-20 Nitish Varshney

这是整个文件？或者例如“内容”可以出现几次？ –

什么是'标签'？包含冒号'：'的每一行是否都有标签？ – 2013-10-20 13:17:43

@KobiK：如上所述，标签可以出现0次或1次。所以无论“内容：”是否存在。 –

如果你想找到一个字符串beetwen 2子，你可以使用re moudle：

import re 

myfile = open(files,'r'); 
filecontent = myfile.read(); 

results = re.compile('content(.*?)score', re.DOTALL | re.IGNORECASE).findall(filecontent) 
print results

一些解释：

IGNORECASE从文档：

执行区分大小写的匹配;像[A-Z]这样的表达式也会匹配小写字母。这不受当前语言环境的影响。

DOTALL从文档：

(Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.

Compile你可以看到here

此外还有一些其他的解决方案，你可以看到here

来源

2013-10-20 13:24:54

Python找到2个标签之间的字符串

回答

相关问题