2013-09-26 93 views
1

我试图使用AutoIt检查文本文件并将选择行输出为CSV。我不断遇到的问题是它需要永远。目前的方法一次检查一条线。它可以每秒烧5-10行,但我在AutoIt框架内寻找更快的东西。快速检查AutoIt中的文本?

代码:

#include <File.au3> 
$xnConfirm = False 
$xnConfirmMsg = 0 
while $xnConfirm = False 

     $xnFile = FileOpenDialog("File to Examine...","%userprofile%","All (*.*)") ;InputBox("File???", "Which file do you want to review?","C:\") 
    If FileExists($xnFile) = True Then 
      $xnConfirm = True 
     Else 
       $xnConfirmMsg = msgbox(1,"File Not Found...",$xnFile & " does not exist." & @crlf & "Please select another file.") 
     EndIf 
WEnd 

$xnConfirm = False 
$xnConfirmMsg = 0 
while $xnConfirm = False 
    $xnTargetFile = FileOpenDialog("Location to Save to...",$xnFile & " - output.csv","All (*.*)");"%userprofile%\Documents\output.csv" 
        ;FileSaveDialog("Location to Save to...","%userprofile%","All (*.*)",16,"output - " & $xnFile & " - output.csv") ; 
     Consolewrite("Outputting to " & $xnTargetFile & @crlf) 

     if fileexists($xnTargetFile) then 
      $xnConfirmMsg = msgbox(4,"Overwrite?","Are you sure you want to overwrite " & @crlf & $xnTargetFile) 

       if $xnConfirmMsg = 6 Then 
        $xnConfirm = True 
        filedelete($xnTargetFile)    
       EndIf 
      Else 

       $xnConfirm = True 

     EndIf  
WEnd 

progresson("Line count","Verifying the number of lines in " & $xnFile) 
$xnFileLine = _FileCountLines($xnFile) ;InputBox("Number of lines","How many lines are in this document?",10000) 
consolewrite("Loading "& $xnFile & " with " & $xnFileLine & " total lines." & @crlf) 
progressoff() 

local $hfl = FileOpen($xnFile,0) 
FileWrite($xnTargetFile,"") 
FileOpen($xnTargetFile, 1) 

$i = 1 

ProgressOn("Creating CSV","Extracting matching data.","",0,0,16) 
$xnTargetLine = 1 

FileWriteLine($xnTargetFile,"Timestamp,Message,Category,Priority,EventId,Severity,Title,Machine,App Domain,ProcessID,Process Name,Thread Name,Win32 ThreadId") 

While $i < $xnFileLine 

        ;$xnCurrentLine = FileReadLine($xnFile,$i) ;Old Settings 
      $xnCurrentLine = FileReadLine($hfl,$i) 
      ;MsgBox(1,"",$xnCurrentLine) 

     Select 
     Case stringinstr($xnCurrentLine,"Timestamp:") 
      $xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & "," 
     Case stringinstr($xnCurrentLine,"Message:") 
      $xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & "," 
     Case stringinstr($xnCurrentLine,"Category:") 
      $xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & "," 
     Case stringinstr($xnCurrentLine,"Win32 ThreadId:") 
      $xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf 
       FileWriteLine($xnTargetFile,$xnTargetLine) 
     case Else 
       consolewrite("Nothing on line " & $i & @crlf) 
     EndSelect 
     $i = $i + 1 
        ProgressSet(round($i/$xnFileLine * 100,1),$i & " of " & $xnFileLine & " lines examined." & @cr & "Thank you for your patience.") 
    WEnd 
ProgressOff() 

要解决的是什么做的,我读类似于跟踪日志日志文件的问题。我希望事件输出为CSV,以便我可以检查趋势。日志文件中的格式如下所示:

Timestamp: 9/26/2013 3:33:23 AM 

Message: Log Event Received 

Category: Transaction 

Win32 ThreadId:2872 

我知道这是代码格式,但我希望它更易于阅读。

+0

从添加这意味着读/格式化日志文件中的这段。 – Xenoranger

+0

和输出应该是什么? (第一次编辑我的文章,它是这样工作吗?) – Teifun2

回答

2

我不确定它会不会更快,但你可以使用正则表达式。 如果你能告诉我多一点点什么规则在这里:

  Case stringinstr($xnCurrentLine,"Timestamp:") 
     $xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & "," 
    Case stringinstr($xnCurrentLine,"Message:") 
     $xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & "," 
    Case stringinstr($xnCurrentLine,"Category:") 
     $xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & "," 
    Case stringinstr($xnCurrentLine,"Win32 ThreadId:") 
     $xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf 
      FileWriteLine($xnTargetFile,$xnTargetLine) 
    case Else 
      consolewrite("Nothing on line " & $i & @crlf) 

,如果你能给我2个或3示例行我可以尝试让你一个,正则表达式功能至极,我认为将是多更快。

编辑:

我做了一个脚本示例。 如果输入文件看起来是这样的:

Timestamp: 9/26/2013 3:33:23 AM 
Message: Log Event Received 
Category: Transaction 
Win32 ThreadId:2872 

那么这个脚本只是正常

#include <Array.au3> 
Local $file = FileOpen("InputFile.txt", 0) 
$sText = FileRead($file) 
$aSnippets = StringRegExp($sText,"(?:Timestamp:|Message:|Category:|Win32 ThreadId:)(?:)?(.+)",3) 
_ArrayDisplay($aSnippets) 

结果是一个数组包含以下内容:

[0] = 9/26/2013 3:33:23 AM 
[1] = Log Event Received 
[2] = Transaction 
[3] = 2872 
etc. 

如果你想把这四条线合在一起,尝试使用一个for循环(如果你想,我可以让你一个)

对于100行他需要0.490570878768441毫秒将每个值存储在一个数组中。

2

(我想添加评论,要求读取数据的采样,但是我没有足够的积分可是...)

取决于输入文件的大小,我推荐阅读整个文件使用_FileReadToArray()一次性转换为数组,然后遍历内存中的数组(而不是在整个过程中保持对文件的访问)。另外,我不会每次写入输出文件 - 我会写入一个字符串,然后在完成时保存字符串。

喜欢的东西:

$outputFileData = "" 
$inputFileData = _FileReadToArray($xnFile) 

For $Counter = 1 to $inputFileData[0] 

     $tmpLine = $inputFileData[$Counter] 

     Select 

     Case stringinstr($tmpLine,"Timestamp:") 
      $outputFileData = stringmid($tmpLine,12,stringlen($tmpLine) - 12 + 1) & "," 

     Case stringinstr($tmpLine,"Message:") 
      $outputFileData &= stringmid($tmpLine,10,stringlen($tmpLine) - 10 + 1) & "," 

     Case stringinstr($xnCurrentLine,"Category:") 
      $outputFileData &= stringmid($tmpLine,11,stringlen($tmpLine) - 11 + 1) & "," 

     Case stringinstr($xnCurrentLine,"Win32 ThreadId:") 
      $outputFileData &= stringmid($tmpLine,16,stringlen($tmpLine) - 16 + 1) & @CRLF 

     case Else 
       ConsoleWrite("Nothing on line " & $i & @crlf) 

     EndSelect 

Next 

FileWriteLine($xnTargetFile, $outputFileData) 

(请注意,我并没有包含任何错误检查,也没有我检查错误:)

0

有一个其他可能的想法。

您可以复制输入文件,重命名它,然后删除每个useles数据。 使用RegularExpressions可能会非常容易,甚至更快。

如果你告诉我的输入文件的一个例子,如何输出文件应该看起来像我可以试试:)