2014-12-05 70 views
0

我正在读取一个日志文件到java中。对于日志文件中的每一行,我正在检查该行是否包含一个IP地址。如果该行包含一个IP地址,我想然后+1的IP地址显示在日志文件中的次数的计数。我怎样才能在Java中实现这一点?统计文档中字符串的唯一出现次数

以下代码成功地从包含ip地址的每行中提取ip地址,但用于计算ip地址发生的过程不起作用。

void read(String fileName) throws IOException { 
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName))); 
    int counter = 0; 
    ArrayList<IPHolder> ips = new ArrayList<IPHolder>(); 
    try { 
     String line; 
     while ((line = br.readLine()) != null) { 
      if(!getIP(line).equals("0.0.0.0")){ 
       if(ips.size()==0){ 
        IPHolder newIP = new IPHolder(); 
        newIP.setIp(getIP(line)); 
        newIP.setCount(0); 
        ips.add(newIP); 
       } 
       for(int j=0;j<ips.size();j++){ 
        if(ips.get(j).getIp().equals(getIP(line))){ 
         ips.get(j).setCount(ips.get(j).getCount()+1); 
        }else{ 
         IPHolder newIP = new IPHolder(); 
         newIP.setIp(getIP(line)); 
         newIP.setCount(0); 
         ips.add(newIP); 
        } 
       } 
       if(counter % 1000 == 0){System.out.println(counter+", "+ips.size());} 
       counter+=1; 
      } 
     } 
    } finally {br.close();} 
    for(int k=0;k<ips.size();k++){ 
     System.out.println("ip, count: "+ips.get(k).getIp()+" , "+ips.get(k).getCount()); 
    } 
} 

public String getIP(String ipString){//extracts an ip from a string if the string contains an ip 
    String IPADDRESS_PATTERN = 
    "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"; 

    Pattern pattern = Pattern.compile(IPADDRESS_PATTERN); 
    Matcher matcher = pattern.matcher(ipString); 
    if (matcher.find()) { 
     return matcher.group(); 
    } 
    else{ 
     return "0.0.0.0"; 
    } 
} 

持有者类是:

public class IPHolder { 

    private String ip; 
    private int count; 

    public String getIp(){return ip;} 
    public void setIp(String i){ip=i;} 

    public int getCount(){return count;} 
    public void setCount(int ct){count=ct;} 
} 
+0

['Map'](https://docs.oracle.com/javase/7/docs/api/java/util/Map.html)可能是你需要的(key = ip,value = count)。番石榴['Multiset'](https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained#Multiset)是一个奇特的选择 – 2014-12-05 22:04:11

+0

@RC。代码如何用地图来代替? – CodeMed 2014-12-05 22:05:11

+0

“Multiset”的链接有一个地图示例 – 2014-12-05 22:06:06

回答

1

的关键词搜索的HashMap是在这种情况下。 HashMap是一个键值对列表(在这种情况下是ips对和它们的计数)。

"192.168.1.12" - 12 
"192.168.1.13" - 17 
"192.168.1.14" - 9 

等等。 使用和访问比总是遍历容器对象数组以查明是否已有该容器的容器要容易得多。

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(/*Your file */))); 

HashMap<String, Integer> occurrences = new HashMap<String, Integer>(); 

String line = null; 

while((line = br.readLine()) != null) { 

    // Iterate over lines and search for ip address patterns 
    String[] addressesFoundInLine = ...; 


    for(String ip: addressesFoundInLine) { 

     // Did you already have that address in your file earlier? If yes, increase its counter by 
     if(occurrences.containsKey(ip)) 
      occurrences.put(ip, occurrences.get(ip)+1); 

     // If not, create a new entry for this address 
     else 
      occurrences.put(ip, 1); 
    } 
} 


// TreeMaps are automatically orered if their elements implement 'Comparable' which is the case for strings and integers 
TreeMap<Integer, ArrayList<String>> turnedAround = new TreeMap<Integer, ArrayList<String>>(); 

Set<Entry<String, Integer>> es = occurrences.entrySet(); 

// Switch keys and values of HashMap and create a new TreeMap (in case there are two ips with the same count, add them to a list) 
for(Entry<String, Integer> en: es) { 

    if(turnedAround.containsKey(en.getValue()))   
     turnedAround.get(en.getValue()).add((String) en.getKey()); 
    else { 
     ArrayList<String> ips = new ArrayList<String>(); 
     ips.add(en.getKey()); 
     turnedAround.put(en.getValue(), ips); 
    } 

} 

// Print out the values (if there are two ips with the same counts they are printed out without an special order, that would require another sorting step) 
for(Entry<Integer, ArrayList<String>> entry: turnedAround.entrySet()) {   
    for(String s: entry.getValue()) 
     System.out.println(s + " - " + entry.getKey());   
} 

在我的情况下,输出是以下几点:

192.168.1.19 - 4 
192.168.1.18 - 7 
192.168.1.27 - 19 
192.168.1.13 - 19 
192.168.1.12 - 28 

我回答this question约半小时前,我想这是你正在寻找什么的,所以如果你需要一些示例代码,看看它。

+0

感谢您和+1的关注。如果我想按照计数的升序排序,代码如何更改? – CodeMed 2014-12-05 22:39:55

+0

编辑我的代码。测试新的部分,它的工作原理。 我还编辑了第一部分来修复两个小问题(int> Integer作为HashMap的类型)。 对HashMap的内容进行排序并不像您想象的那样容易,因为创建HashMap会被非常快地访问,因此会跳过排序过程。 如果您有任何其他问题,请不要犹豫,问。 :) – Phiwa 2014-12-06 09:48:14

0

下面是一些使用HashMap存储IP和一个正则表达式以在每一行中匹配它们的代码。它使用try-with-resources来自动关闭文件。

编辑:我添加了代码以降序排列,就像你在其他答案中所要求的一样。

void read(String fileName) throws IOException { 
    //Step 1 find and register IPs and store their occurence counts 
    HashMap<String, Integer> ipAddressCounts = new HashMap<>(); 
    try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)))) { 
     Pattern findIPAddrPattern = Pattern.compile("((\\d+.){3}\\d+)"); 
     String line; 
     while ((line = br.readLine()) != null) { 
      Matcher matcher = findIPAddrPattern.matcher(line); 
      while (matcher.find()) { 
       String ipAddr = matcher.group(0); 
       if (ipAddressCounts.get(ipAddr) == null) { 
        ipAddressCounts.put(ipAddr, 1); 
       } 
       else { 
        ipAddressCounts.put(ipAddr, ipAddressCounts.get(ipAddr) + 1); 
       } 
      } 
     } 
    } 

    //Step 2 reverse the map to store IPs by their frequency 
    HashMap<Integer, HashSet<String>> countToAddrs = new HashMap<>(); 
    for (Map.Entry<String, Integer> entry : ipAddressCounts.entrySet()) { 
     Integer count = entry.getValue(); 
     if (countToAddrs.get(count) == null) 
      countToAddrs.put(count, new HashSet<String>()); 
     countToAddrs.get(count).add(entry.getKey()); 
    } 

    //Step 3 sort and print the ip addreses, most frequent first 
    ArrayList<Integer> allCounts = new ArrayList<>(countToAddrs.keySet()); 
    Collections.sort(allCounts, Collections.reverseOrder()); 
    for (Integer count : allCounts) { 
     for (String ip : countToAddrs.get(count)) { 
      System.out.println("ip, count: " + ip + " , " + count); 
     } 
    } 
} 
相关问题