2
我无法改变的遗留系统每天抽出5千兆大部分糟糕的XML日志并且吹掉我的摄取许可证。 每分钟发生1000次以上的详细错误有两类,但每隔几分钟就有一次真正有趣的输入。 我想大幅度缩短SED的重复条目,并保留有趣的不变XML日志文件正则表达式
所以我需要什么
1的正则表达式匹配各2班烦人的日志条目(如...”的十进制'...和...'DBNull'...但不偶尔有趣的)。
一个正则表达式匹配每个恼人的错误类是很好,我可以做2个SED通过
2.我需要一个捕获组与时间戳,所以我可以更换一个简洁版的长XML行 - 但正确时间戳,以免丢失保真度。
我已经得到尽可能此匹配和捕获创建日期:
(?:<Log).*?(createdDate="\d{2}\/\d{2}\/\d{4}.\d{2}:\d{2}:\d{2}").*?(?:decimal).*?(<\/Log>)
这是接近,但是从一种逆向贪婪的,我匹配从“小数”到遭遇开口日志声明的几个条目前面 发挥各地的负向后看只是给自己一个严重的头痛
样本数据
<Log type="ERROR" createdDate="11/09/2015 08:13:14" >
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
]]></Log>
<Log type="ERROR" createdDate="11/09/2015 08:13:13" >
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
]]></Log>
<Log type="ERROR" createdDate="11/09/2015 08:13:12" >
<![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef,): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=]
Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
Parameters:
[RETURN_VALUE][ReturnValue] Value: [0]
---> System.InvalidCastException: Conversion from type 'DBNull' to type 'Long' is not valid.
]]></Log>
<Log type="ERROR" createdDate="11/09/2015 08:13:11" >
<![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef,): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=]
Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
]]></Log>
<Log type="ERROR" createdDate="11/09/2015 08:13:10" >
<![CDATA[ [231] An actual interesting log entry with a real error message ]]></Log>
<Log type="ERROR" createdDate="11/09/2015 08:13:09" >
<![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
]]></Log>
完美谢谢Casimir - 您对行开头的日志文件是正确的。基于sed的解决方案,而不是纯粹的正则表达式,并不完全符合我的期望 - 但非常有见地,而且绝对是要走的路 –