2015-09-13 65 views
2

我无法改变的遗留系统每天抽出5千兆大部分糟糕的XML日志并且吹掉我的摄取许可证。 每分钟发生1000次以上的详细错误有两类,但每隔几分钟就有一次真正有趣的输入。 我想大幅度缩短SED的重复条目,并保留有趣的不变XML日志文件正则表达式

所以我需要什么
1的正则表达式匹配各2班烦人的日志条目(如...”的十进制'...和...'DBNull'...但不偶尔有趣的)。
一个正则表达式匹配每个恼人的错误类是很好,我可以做2个SED通过
2.我需要一个捕获组与时间戳,所以我可以更换一个简洁版的长XML行 - 但正确时间戳,以免丢失保真度。

我已经得到尽可能此匹配和捕获创建日期:

(?:<Log).*?(createdDate="\d{2}\/\d{2}\/\d{4}.\d{2}:\d{2}:\d{2}").*?(?:decimal).*?(<\/Log>) 

这是接近,但是从一种逆向贪婪的,我匹配从“小数”到遭遇开口日志声明的几个条目前面 发挥各地的负向后看只是给自己一个严重的头痛

样本数据

<Log type="ERROR" createdDate="11/09/2015 08:13:14" > 
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format. 
    ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:13" > 
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format. 
    ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:12" > 
<![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef,): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
    Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid. 
Parameters: 
[RETURN_VALUE][ReturnValue] Value: [0] 
---> System.InvalidCastException: Conversion from type 'DBNull' to type 'Long' is not valid. 
]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:11" > 
<![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef,): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
    Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid. 
    ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:10" > 
<![CDATA[ [231] An actual interesting log entry with a real error message ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:09" > 
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format. 
    ]]></Log> 

回答

0

不知道你是exaclty寻找,但是这是一个如何隔离<Log...</Log>块,并继续到更换一个例子:

/^<Log/{ # condition: a line that starts with "<Log " 
    :a; # define the label "a" 
    /<\/Log>/! { # condition: if the line doesn't contain "</Log>" 
     N;  # append the next line to the pattern space 
     ba;  # go to the label "a" 
    }; 
    s/>.*\(decimal\|DBNull\).*</>\1</ # replace the block 
} 

(我假定<Log是:

sed '/^<Log /{:a;/<\/Log>/!{N;ba;};s/>.*\(decimal\|DBNull\).*</>\1</}' file.log 

细节总是在行的开头,不像第10和11部分的记录那样,可能是错别字)

+0

完美谢谢Casimir - 您对行开头的日志文件是正确的。基于sed的解决方案,而不是纯粹的正则表达式,并不完全符合我的期望 - 但非常有见地,而且绝对是要走的路 –