最Python的方式来计算平均

我作为在3D字典数据：最Python的方式来计算平均

movieid, date,customer_id,views 
0, (2011,12,22), 0, 22 
0, (2011,12,22), 1, 2 
0, (2011,12,22), 2, 12 
..... 
0, (2011,12,22), 7, 2 
0, (2011,12,23), 0, 123

.. 所以基本上数据代表了多少次电影已经每天被观看..每个客户（有8个客户）..

现在，我想计算.. 平均一个电影已被每个客户观看了多少次。

所以基本上

movie_id,customer_id, avg_views 
    0, 0, 33.2 
    0, 1 , 22.3 

    and so on

什么是解决这个的Python的方式。

Thakns

编辑：

data = defaultdict(lambda : defaultdict(dict)) 
date = datetime.datetime(2011,1,22) 
data[0][date][0] = 22 
print data 
defaultdict(<function <lambda> at 0x00000000022F7CF8>, 
{0: defaultdict(<type 'dict'>, 
{datetime.datetime(2011, 1, 22, 0, 0): {0: 22}}))

假设有只有2客户，1部电影和2天的数据

movie_id, date, customer_id,views 
0 , 2011,1,22,0,22 
0 , 2011,1,22,1,23 
0 , 2011,1,23,0,44

注意：客户1 didnt看了一部电影ID 0日23日1月现在的答案是

movie_id,customer_id,avg_views 
    0 , 0 , (22+44)/2 
    0, 1,  (23)/1

来源

2012-11-26 Fraz

请发布（至少一个条目）来保存这些数据的三维字典。 – inspectorG4dget

如果你可以告诉我们你想要的结果如何...... –

你可以格式化你的'defaultdict'，以便它是人类可读的吗？如果需要，使用'pprint.pprint'。 – inspectorG4dget

sum使这个很容易。在我的原始版本中，我使用dict.keys()很多，但是通过字典迭代默认情况下会为您提供密钥。

此函数计算结果的一行：

def average_daily_views(movie_id, customer_id, data): 
    daily_values = [data[movie_id][date][customer_id] for date in data[movie_id]] 
    return sum(daily_values)/len(daily_values)

然后，你可以循环它得到任何你想要的形式。也许：

def get_averages(data): 
    result = [average_daily_views(movie, customer, data) for customer in 
       data[movie] for movie in data]

来源

2012-11-26 16:17:18 PeterBB

我的愿景是：

pool = [ 
    (0, (2011,12,22), 0, 22), 
    (0, (2011,12,22), 1, 2), 
    (0, (2011,12,22), 2, 12), 
    (0, (2011,12,22), 7, 2), 
    (0, (2011,12,23), 0, 123), 
] 


def calc(memo, row): 
    if (row[2] in memo.keys()): 
     num, value = memo[2] 
    else: 
     num, value = 0, 0 

    memo[row[2]] = (num + 1, value + row[3]) 
    return memo 

# dic with sum and number 
v = reduce(calc, pool, {}) 
# calc average 
avg = map(lambda x: (x[0], x[1][1]/x[1][0]), v.items()) 

print dict(avg)

哪里avg - 是字典，键= CUSTOMER_ID和价值 - 平均的意见

来源

2012-11-26 16:24:05 Rustem

我想你应该调整你的数据一点，更好地服务你的目的：

restructured_data = collections.defaultdict(lambda: collections.deafualtdict(collections.defaultdict(int))) 
for movie in data: 
    for date in data[movie]: 
     for customer,count in date.iteritems(): 
      restructured_data[customer_id][movie_id][date] += count 

averages = collections.defaultdict(dict) 
for customer in restructured_data: 
    for movie in restructured_data[customer]: 
     avg = sum(restructured_data[customer][movie].itervalues())/float(len(restructured_data[customer][movie])) 
     averages[movie][customer] = avg 

for movie in averages: 
    for customer, avg in averages[movie].iteritems(): 
     print "%d, %d, %f" %(movie, customer, avg)

希望这会有所帮助

来源

2012-11-26 16:40:27 inspectorG4dget

最Python的方式来计算平均

回答

相关问题