2012-05-28 53 views
1

我需要从日志文件中识别服务器事件。我正在使用patter匹配的目的。我的正则表达式不起作用。请检查我的正则表达式是错误的还是其他问题。定期为服务器日志匹配

样品输入为: -

2009/12/14 11:49:20.55     00 STARTUP Distributed Access Infrastructure V1.1.0 
2009/12/14 11:49:20.55     01 STARTUP Tools Access Server initialization started 
2009/12/14 11:49:20.55 TAS#####EC05003E 00 STARTUP Environment:  
2009/12/14 11:49:20.55 TAS#####EC05003E 01 STARTUP Job.....DAITAS  System...EC05  ASID.....003E  
2009/12/14 11:49:20.55 TAS#####EC05003E 02 STARTUP User....USRT001 Group....SYS1  JobNum...STC00079 
2009/12/14 11:49:20.55 TAS#####EC05003E 03 STARTUP Local...GMT-08  GMT......2009/12/14 19:49 

我的脚本是:

public void map(Object key, Text value, Context context) throws IOException , InterruptedException{ 

     String input=value.toString(); 
     String delimiter= "[\n]"; 
     String[] tokens=input.split(delimiter); 
     String sample = null; 

     Pattern pattern; 
     String regex= " \\s+\\d+\\s+[a-z,A-Z]+\\s "; 
     pattern=Pattern.compile(regex); 




     for(int i=0;i<tokens.length;i++){ 
      sample=tokens[i]; 
      System.out.println(sample.toString()); 
      System.out.println("enter here"); 

      Matcher match=pattern.matcher(sample); 
      boolean val = match.matches(); 

      System.out.println("the conditions" + val); 
      System.out.println("enter here 2"); 
      if(val){ 
       System.out.println("the regex is found" + val); 
       logEvent.set(sample); 
       System.out.println("the value of logEvent is "+ logEvent); 
      } 
      else{ 
       logInformation.set(sample); 
       System.out.println("the log informaTION" + logInformation); 
      } 
     context.write(logEvent, logInformation);  

我必须承认 - 启动

感谢

+0

你必须匹配哪些数据? – Cylian

+0

在示例日志中 - 事件是 - “STARTUP”。类似地,还有其他事件处于相同模式。我需要将它们与这些事件匹配并将它们设置为logEvent。 –

回答

0

试试这个

try { 
    Regex regexObj = new Regex(@"(?im)\s+(?<event>\d+\s+[a-z]+)\s+(?<details>[^\r\n]+)$"); 
    Match matchResults = regexObj.Match(subjectString); 
    while (matchResults.Success) { 
     for (int i = 1; i < matchResults.Groups.Count; i++) { 
      Group groupObj = matchResults.Groups[i]; 
      if (groupObj.Success) { 
       // matched text: groupObj.Value 
       // match start: groupObj.Index 
       // match length: groupObj.Length 
      } 
     } 
     matchResults = matchResults.NextMatch(); 
    } 
} catch (ArgumentException ex) { 
    // Syntax error in the regular expression 
} 

正则表达式的解释

@" 
(?im)   # Match the remainder of the regex with the options: case insensitive (i);^and $ match at line breaks (m) 
\s    # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) 
    +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
(?<event>  # Match the regular expression below and capture its match into backreference with name “event” 
    \d    # Match a single digit 0..9 
     +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
    \s    # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) 
     +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
    [a-z]   # Match a single character in the range between “a” and “z” 
     +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
) 
\s    # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) 
    +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
(?<details> # Match the regular expression below and capture its match into backreference with name “details” 
    [^\r\n]  # Match a single character NOT present in the list below 
        # A carriage return character 
        # A line feed character 
     +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
) 
$    # Assert position at the end of a line (at the end of the string or before a line break character) 
" 

希望这有助于。

+0

谢谢,但您的代码只能识别关键字 - “STARTUP”。事情是我有8个不同的事件在类似的模式。另一个示例输入就像 - 2009/12/14 11:49:20.94 TAS ##### EC05003E 00 ConfgMem TAS配置成员内容在这里,我需要匹配 - “ConfgMem”。 –

+0

您是否需要将完整的行与STARTUP关键字或同一行中的前一个内容匹配或在同一行中跟随内容关键字?以上代码只适用于最后一种情况。 – Cylian

+0

好的。我思考的方式是---我们有一个空间,所以“\ s +”就是这样。在这个数字后面会出现 - “\ d +”,然后再次出现空格。然后事件发生,所以“[a-z,A-Z] + \ S”跟随空格。 sO正则表达式应该是String regex =“\\ s + \\ d + \\ s + [a-z,A-Z] + \\ s”; 。我必须匹配主要工作 - STARTUP,PreLoad,ConfgEXE等。 –