2016-01-11 39 views
0

我试图在PowerShell中使用正则表达式从实时日志文件中提取数字。我的正则表达式的工作原理是,它只会返回一个数字到字母A的左边,但由于某种原因它返回的是整行而不是孤立的数字。从日志行中提取数字

我想日志文件的转换:

1/11/2016 3:26:12 PM 1/11/2016 3:27:00 PM 86.4 A 
1/11/2016 3:26:12 PM 1/11/2016 3:28:00 PM 86.3 A 
1/11/2016 3:26:12 PM 1/11/2016 3:29:00 PM 86.8 A 
1/11/2016 3:26:12 PM 1/11/2016 3:29:16 PM 86.7 A

要:

86.4 
86.3 
86.8 
86.7

这里是到目前为止我的代码:

$DATAPath = "C:\Code\DATA.txt" 
$regex = '.*\d\s+A' 

Get-Content -Path $DATAPath -Tail 1 -Wait | 
    Select-String -Pattern $regex -AllMatches 
+0

用['\ d + \。\ d + \ S * A'](https://regex101.com/r/nV6tQ6/ 1) – Tushar

+1

'\ d +(?:\。\ d +)?(?= \ s + A)' –

回答

1

正则表达式本身是有点古怪的.*\d\s+A的意思是:“任何事情都会发生,然后是一个数字,然后至少有一个wihtespace,最后是字母A”。这涵盖了比您感兴趣的案例更多的案例,比如说只包含“94.9 A”等四个字符的行。

根据日志文件结构和误报,更严格的方法和/或分组是有帮助的。像这样,(?:PM\s+)(\d+\.\d+)(?:\s+A)

(?:PM\s+) := match letters PM followed with at least one whitespace 
(\d+\.\d+) := match at least one digit followed by dot and at least one digit 
(?:\s+A) := match at least one whitespace followed by letter A 

作为一个例子,

[regex]$regex = '(?:PM\s+)(\d+\.\d+)(?:\s+A)' 

$s = @("1/11/2016 3:26:12 PM 1/11/2016 3:27:00 PM 86.4 A", 
"1/11/2016 3:26:12 PM 1/11/2016 3:28:00 PM 86.3 A", 
"1/11/2016 3:26:12 PM 1/11/2016 3:29:00 PM 86.8 A", 
"1/11/2016 3:26:12 PM 1/11/2016 3:29:16 PM 86.7 A", 
"foobarline shouldn't match", 
"94.9 A", 
"PM 84.8 A") 

# Note that the two invalid rows are skipped 
$s | % { $regex.Matches($_) | % {$_.groups[1].value} } 
86.4 
86.3 
86.8 
86.7 
84.8 
+0

古怪有点轻描淡写!感谢您打破我的正则表达式代码并添加一些额外的例子。我完全不熟悉.NET语言编程,但是这帮助了很多! – KILLADELPHIA