计算某个字符串在特定列中出现的次数

我想查看第4列中出现了多少次字符串。更具体地说，某些Netflow数据中出现了多少次端口号。有成千上万的端口，所以我没有寻找任何特定的递归。我已经使用冒号后面的数字解析了列，并且我希望代码检查该数字发生了多少次，因此最终输出应该使用它发生的次数来打印数字。计算某个字符串在特定列中出现的次数

[OUTPUT ]

Port: 80 found: 3 times. 
Port: 53 found: 2 times. 
Port: 21 found: 1 times.

[CODE]

import re 


frequency = {} 

file = open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') 

with open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') as infile:  
    next(infile) 
    for line in infile: 
     data = line.split()[4].split(":")[1] 
     text_string = file.read().lower() 
     match_pattern = re.findall(data, text_string) 


for word in match_pattern: 
    count = frequency.get(word,0) 
    frequency[word] = count + 1 

frequency_list = frequency.keys() 

for words in frequency_list: 
    print ("port:", words,"found:", frequency[words], "times.")

[FILE]

Date first seen   Duration Proto  Src IP Addr:Port   Dst IP Addr:Port Packets Bytes Flows 
2017-04-02 12:07:32.079  9.298 UDP   8.8.8.8:80 ->  205.166.231.250:8080  1  345  1 
2017-04-02 12:08:32.079  9.298 TCP   8.8.8.8:53 ->  205.166.231.250:80  1  75  1 
2017-04-02 12:08:32.079  9.298 TCP   8.8.8.8:80 ->  205.166.231.250:69  1  875  1 
2017-04-02 12:08:32.079  9.298 TCP   8.8.8.8:53 ->  205.166.231.250:443  1  275  1 
2017-04-02 12:08:32.079  9.298 UDP   8.8.8.8:80 ->  205.166.231.250:23  1  842  1 
2017-04-02 12:08:32.079  9.298 TCP   8.8.8.8:21 ->  205.166.231.250:25  1  146  1

来源

2017-04-12 k5man001

OK。你的问题是什么？ –

顺便说一句，你为什么用'file.read' *和*'作为infile中的行？这似乎在吠叫。 –

另外最后的输出循环应该是：'对于端口，在d.items（）中计数：print（“port：”，port，“found：”，count，“times。”）' - 如果你使用'iteritems'都停留在Python 2.7上。 –

你需要的东西，如：

frequency = {} 
with open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') as infile:  
    next(infile) 
    for line in infile: 
     port = line.split()[4].split(":")[1] 
     frequency[port] = frequency.get(port,0) + 1 

for port, count in frequency.items(): 
    print("port:", port, "found:", count, "times.")

的这个心脏是您保留端口的字典来算，并增加这对于每一行。 dict.get将返回当前值或默认值（在本例中为0）。 OK。

来源

2017-04-12 08:52:48

它的工作，谢谢！ – k5man001

我如何从最多到最不重要的排序呢？ – k5man001

这是一个单独的问题 - 几乎肯定是重复的 –

来自python标准库。将返回一个正是你正在寻找什么字典。

from collections import Counter 
counts = Counter(column) 
counts.most_common(n) # will return the most common values for specified number (n)

来源

2017-04-12 07:02:54 BeigeBruceWayne

一些更多的解释在这里会很有用。目前我看不出这是如何回答这个问题的（这也不是一个真正的问题）。 – SiHa

哦，我的天哪我的问题是，代码如何计算一个字符串在列[4]中发生的次数，而不必指定字符串是什么，它只是查找任何字符串的递归并给出一个计数。 – k5man001

它应该比较1行到文件中的所有其他行，并继续这样做，直到每一行进行比较和计数，如果这是有道理的？ – k5man001

计算某个字符串在特定列中出现的次数

回答

相关问题