计算列表中项目的频率

我想每年在每个地区计算事故频率。我该如何使用Python来做到这一点。计算列表中项目的频率

FILE.CSV

Region,Year 
1,2003 
1,2003 
2,2008 
2,2007 
2,2007 
3,2004 
1,2004 
1,2004 
1,2004

我尝试使用计数器，但它仅适用于一个列。例子：在区域1 2003年，有2 所以结果应该是：

Region,Year, freq 
    1,2003,2 
    1,2003,2 
    2,2008,1 
    2,2007,2 
    2,2007,2 
    3,2004,1 
    1,2004,3 
    1,2004,3 
    1,2004,3

我试图做这种方式。但它似乎并不正确。

from collections import Counter 

data = pandas.DataFrame("file.csv") 
freq_year= Counter(data.year.values) 
dz = [dom[x] for x in data.year.values] 
data["freq"] = data["year"].apply(lambda x: dom[x])

我在考虑使用Groupby。你知道如何做到这一点吗？

来源

2014-04-11 user3378649

有可能是一个更好的办法，但我首先附加虚拟列和计算freq基于列，如：

df["freq"] = 1 
df["freq"] = df.groupby(["Year", "Region"]).transform(lambda x: x.sum())

这将返回以下DF：

Region Year freq 
0  1 2003  2 
1  1 2003  2 
2  2 2008  1 
3  2 2007  2 
4  2 2007  2 
5  3 2004  1 
6  1 2004  3 
7  1 2004  3 
8  1 2004  3

来源

2014-04-11 23:33:47 Blaszard

完美！谢谢你 – user3378649

我想绘制这个数据集。但似乎我面临一个概率。你可以看看PLZ这个问题：http://stackoverflow.com/questions/23024439/how-to-customize-axes-in-3d-hist-python-matplotlib – user3378649

我不太了解matplotlib，没有经验在3D情节。希望你能在那里得到帮助...... – Blaszard

不是pandas解决方案，但能够完成任务：

import csv 
from collections import Counter 

inputs = [] 
with open('input.csv') as csvfile: 
    reader = csv.reader(csvfile) 
    for row in reader: 
     inputs.append(tuple(row)) 

freqs = Counter(inputs[1:]) 
print freqs 
# Counter({('1', '2004'): 3, ('1', '2003'): 2, ('2', '2007'): 2, ('2', '2008'): 1, ('3', '2004'): 1})

这里的关键是有值的元组，这样Counter会发现它们相等。

来源

2014-04-11 23:24:06 Hamatti

计算列表中项目的频率

回答

相关问题