我对Python非常陌生,我已经在这里找到了大多数问题的答案,但是这个问题让我很难过。Python - 在单行/多行文件中读取文件
我处理使用Python的日志文件,一般每行以日期/时间戳记,如启动:
[1/4/13 18:37:37:848 PST]
99例的%我可以逐行读取,寻找感兴趣的项目和相应地处理它们,但是偶尔在日志文件中的条目将包括具有回车符/换行符的消息,因为它将跨越多行。
有没有一种方法可以轻松读取“时间戳”之间的文件,以便在出现这种情况时多条线会合并到一个读取中?例如:
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
got some new line
characters mixed in
[1/4/13 18:37:37:848 PST] The last log entry
将被读作四行而不是六行,因为它现在是。
在此先感谢您的帮助。
克里斯,
更新....
myTestFile.log包含上面完全相同的文字,这里是我的脚本:
import sys, getopt, os, re
sourceFolder = 'C:/MaxLogs'
logFileName = sourceFolder + "/myTestFile.log"
lines = []
def timestamp_split(file):
pattern = re.compile("\[(0?[1-9]|[12][0-9]|3[01])(\/)(0?[1-9]|[12][0-9]|3[01])(\/)([0-9]{2})(\)")
current = []
for line in file:
if not re.match(pattern,line):
if current:
yield "".join(current)
current == [line]
else:
current.append(line)
yield "".join(current)
print "--- START ----"
with open(logFileName) as file:
for entry in timestamp_split(file):
print entry
print "- Record Separator -"
print "--- DONE ----"
当我运行它,我得到这个:
--- START ----
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
- Record Separator -
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
- Record Separator -
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
[1/4/13 18:37:37:848 PST] The last log entry
- Record Separator -
--- DONE ----
我似乎是迭代了太多次,我期待(希望)f或者是这样的:
--- START ----
[1/4/13 18:37:37:848 PST] A log entry
- Record Separator -
[1/4/13 18:37:37:848 PST] Another log entry
- Record Separator -
[1/4/13 18:37:37:848 PST] A log entry that somehow got some new line characters mixed in
- Record Separator -
[1/4/13 18:37:37:848 PST] The last log entry
- Record Separator -
--- DONE ----
如在我意外地与来自时我测试正则表达式模式的比较离开不评价所讨论的,如果删除它,然后我得到的所有其迷惑局部线的我更多!
--- START ----
got some new line
characters mixed in
- Record Separator -
got some new line
characters mixed in
- Record Separator -
--- DONE ----
看起来不错,谢谢,我有一个匹配时间戳的正则表达式,所以我认为我可以修改上面的代码来做一个匹配,而不是依靠[ - 我认为这应该足够强大,我不认为我的数据中会出现类似的时间戳。 – Chris
感觉接近...我似乎仍然得到奇怪的结果,有什么我失踪? – Chris
@Chris你想检查模式*是否匹配行的开头。如果它是**时间戳线,则应该执行“if”块。 –