2011-11-15 58 views
0
public static String entryPattern = "^([\\d.]+) (\\S+) (.+?) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\""; 

    public static void parseTwigLine(String line) { 
     Pattern p = Pattern.compile(entryPattern); 
     Pattern p1; 
     Matcher matcher = p.matcher(line); 
     System.out.println(matcher.groupCount()); 
     if (!matcher.matches() || NUM_FIELDS != matcher.groupCount()) { 
      System.err.println("Bad log entry (or problem with RE?):"); 
      System.err.println(line); 
      return; 
     } 

     timeStamp = matcher.group(4); 
     ipAddress = matcher.group(1); 
     if (!matcher.group(3).equals("-")) { 
     userName = matcher.group(3); 
     } 
     request = matcher.group(5); 
     response = matcher.group(6); 
     bytesSent = matcher.group(7); 
     browser = matcher.group(9); 

     if (!matcher.group(8).equals("-")) 
     url = matcher.group(8); 
     instanceName = url.split("/")[3]; 
     if(request.contains("?q")) { 
      queryTerms = request.split("[?|&]")[1]; 
     } else if(url.contains("?q")) { 
      queryTerms = url.split("[?|&]")[1].split("=")[1]; 
     } 
     if(request.contains("&f")) { 
      filters = request.split("&f=")[1]; 
     } else if(url.contains("&f")) { 
      filters = request.split("&f=")[1]; 
     } 

    } 

对于此行,我的正则表达式没有得到匹配..任何建议为什么会发生这种情况。因为我从上面的代码中总是收到一个错误Bad log entry (or problem with RE?)。什么毛病我正则表达式在Java中使用正则表达式解析字符串行

10.53.32.1 - - [14/Nov/2011:09:45:56 -0800] "GET /host-ui/themes/client/images/preview/left6_na.gif HTTP/1.1" 304 - "http://search.host.com/search-ui/?q=8960" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; MS-RTC LM 8; InfoPath.3; BOIE9;ENUS)" 

而对于这个下面一行是matched--越来越

10.53.32.1 - - [14/Nov/2011:09:45:56 -0800] "GET /host-ui/themes/client/images/btn_close_include.png HTTP/1.1" 200 1023 "http://search.host.com/search-ui/?q=8960" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; MS-RTC LM 8; InfoPath.3; BOIE9;ENUS)" 
+0

对于已知格式的消息,是一个正则表达式的最佳途径吗?似乎它可能更容易打破它知道数据呈现在一个非常一致的模式,然后如果你需要,分解使用更简单的正则表达式,分裂等各个部分(如表单参数) –

+0

@Dave Newton,哪种方法最好。使用正则表达式或只是通过分割字符串.. – ferhan

+0

不知道;如果速度不是问题,那可能没关系。 –

回答

1

\d+不匹配-,用的东西做替换它。例如:

Original: "^([\\d.]+) (\\S+) (.+?) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\"" 
Fixed: "^([\\d.]+) (\\S+) (.+?) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\S+) \"([^\"]+)\" \"([^\"]+)\"" 
相关问题