我有一个日志文件,其中有一堆行,其中每一堆都由空行分隔。我想从每一行中挑选特定的行(包含常见模式)。每一行都是关于邮件的。样品日志文件如下:使用awk grep从每一堆多行
#START#
03:48:19:798: : <23/08/2012 03:48:19:019>
03:48:19:798: : <---23/08/2012 03:48 --->
03:48:19:799: : MAIL FROM IP=1.2.3.4
03:48:19:799: : START CHECKING OF IPLIMIT
03:48:19:799: : STOP CHECKING OF IPLIMIT
03:48:20:848:In : MAIL FROM: <[email protected]>
03:48:20:848: : [A:A:A]
03:48:20:849: : max attach size-->5242880
03:48:20:856: : User Is Authenticated with "[email protected] and domain abc.com"
03:48:20:856: : Passed
03:48:20:987:In : RCPT TO: <[email protected]>
03:48:20:987: : email [email protected]
03:48:20:992: : [A:A:A]
03:48:20:999: : passed
03:48:20:999:Inside the Store Mails
03:48:20:999: : BCC feature is not applicable [email protected]
03:48:21:000: : BCC feature is not applicable from [email protected]
03:48:21:000:Inside the Store
03:48:21:132:In : RCPT TO: <[email protected]>
03:48:21:132: : email [email protected]
03:48:21:133: : [A:A:A]
03:48:21:140: : passed
03:48:21:140:Inside the Store Mails
03:48:21:140: : BCC feature is not applicable [email protected]
03:48:21:140: : not authenticated
03:48:21:140:Inside the Store
03:48:21:271: : Data Received
03:50:32:049: : 552 Size Limit Exceeded(5242880)
03:50:32:049: : File Moved in LargeSize Folder....
03:50:32:049: : File Moved in LargeSize Folder....
03:50:32:049: : Connection closed
03:50:32:049: : File Deleted /home/Mail//mailbox/LargeSize/[email protected]:24085.444724474357(1345673901000)
03:50:32:051: : File Deleted /home/Mail//mailbox/LargeSize/[email protected]:39872.512978520455(1345673901140)
MAIL DATA : : 6815779 Bytes
Total: Conn : 16713 Quit By Host : 5565 Stored : 11134 Loop:0
#END#
W A R N I N G ---------------W A R N I N G
...Waiting for activity on port Total Thread Started & 16732 Stoped 16730
#START#
03:56:20:790: : <23/08/2012 03:56:20:020>
03:56:20:790: : <---23/08/2012 03:56 --->
03:56:20:791: : MAIL FROM IP=2.3.4.5
03:56:20:792: : IP IS FRIEND IN WHITELIST
03:56:20:834:In : MAIL FROM:<[email protected]>
03:56:20:834: : [A:A:A]
03:56:20:834: : null
03:56:20:834: : Passed
03:56:20:834:In : RCPT TO: <[email protected]>
03:56:20:834: : email [email protected]
03:56:20:835: : Mailing List
03:56:20:835: : [A:A:A]
03:56:20:836: : passed
03:56:20:836: : Proceesing maillist
03:56:20:839: : Data Received
03:56:20:865: : /home/Mail//mailbox/MailingList/[email protected]:79602.39544573233(1345674380836) Msg Queued For Delivery
03:56:20:865: : Msg forward successfully
03:56:20:865: : /home/Mail//mailbox/MailingList/M14310.39892966699(1345674380837) Msg Queued For Delivery
MAIL DATA : : 27985 Bytes
Total: Conn : 16732 Quit By Host : 5582 Stored : 11135 Loop:0
#END#
...Waiting for activity on port Total Thread Started & 16735 Stoped 16731
#START#
03:56:23:957: : <23/08/2012 03:56:23:023>
03:56:23:957: : <---23/08/2012 03:56 --->
03:56:23:958: : MAIL FROM IP=2.3.4.5
03:56:23:959: : IP IS FRIEND IN WHITELIST
03:56:23:999:In : MAIL FROM: <[email protected]>
03:56:23:999: : [A:A:A]
03:56:23:999: : null
03:56:23:999: : Passed
03:56:23:999:In : RCPT TO: <[email protected]>
03:56:23:999: : email [email protected]
03:56:24:000: : [A:A:A]
03:56:24:007: : passed
03:56:24:008:Inside the Store Mails
03:56:24:009: : BCC feature is not applicable [email protected]
03:56:24:009: : not authenticated
03:56:24:009:Inside the Store
03:56:24:009: : Data Received
03:56:24:053: : /home/Mail//mailbox/External/[email protected]:50098.70335800691(1345674384009) Msg Queued For Delivery
03:56:24:054: : Msg forward successfully
MAIL DATA : : 28276 Bytes
Total: Conn : 16735 Quit By Host : 5582 Stored : 11136 Loop:0
#END#
这里,[email protected]是一个外部邮件ID,并[email protected],[email protected]是内部邮件的ID。 对于每封邮件,都会生成从#START#到#END#开始的一堆行。
从每一行我想要运行一些模式匹配。我只想要那些邮件从内部电子邮件ID到外部电子邮件ID(第二行)的行。
我不想在邮件来自外部电子邮件地址/编号到内部电子邮件编号(第1行),或从内部电子邮件编号到内部电子邮件编号的一堆行。 (第三行)。
而且我有一堆邮件是从内部到外部的行,我想提取包含单词FROM
和TO
的行。
我试着用AWK的RS
,ORS
,FS
和OFS
变量线的每串转换,从开始到结束#START#
使单行记录,但不能。我无法用|
或~
等分隔符替换换行符。另外,我现在不会如何在每个资源记录上运行多个模式匹配。
我试过使用/PATTERN/
选项,但后来无法运行grep命令使用system()
函数来获取行来检查域名。它给了我错误:sh: 1: not found
。无法突破它。我使用的代码:
if ($0 ~ /FROM/) { print $0 | system("egrep -i 'FROM|TO'") }
另外,如果我尝试使用以下类型的代码导出的每个记录,它不工作:
for i in $(cat log_file | awk_file_givin_1_resource_record_at_a_time) ; do pattern_matching_commands ; done
这是没有工作引起的模式匹配正在线我一次希望它能够在整个团队中工作。
这感觉有点过于宽泛;这里有很多个人问题。我会尽力将问题分解成几个步骤,并分别解决每一步骤。如果您在某个步骤中遇到问题,那么可以在这里提出一个更好,更有针对性的问题。 – chepner
@chepner:我认为,如果我设法将变量中的每一行都带到变量中,并将该变量与bash命令一起使用,那么我可能会对其执行正常的bash操作(尽管很多)来提取我想要的信息。 –