2014-03-19 88 views
0

我正在学习并尝试使用LFM(潜在因子模型)构建玩具推荐系统。所以我在这个页面中找到了一些关于矩阵分解的内容(http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/使用字典进行矩阵分解和“nan”出现在dictionary.values()

该页面内的代码可以完美运行。但是在我的工作中,矩阵应该是一个稀疏矩阵,因为在初始化之后大量元素保持空白。所以我用字典重写它,一切都搞砸了。

这里是在网页中给出的代码:

import numpy 

def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02): 
    Q = Q.T 

    for step in xrange(steps): 
     for i in xrange(len(R)): 
      for j in xrange(len(R[i])): 
       if R[i][j] > 0: 
        eij = R[i][j] - numpy.dot(P[i,:],Q[:,j]) 
        for k in xrange(K): 
         P_temp = P[i][k] 
         Q_temp = Q[k][j] 

         P[i][k] = P_temp + alpha * (2 * eij * Q_temp - beta * P_temp) 
         Q[k][j] = Q_temp + alpha * (2 * eij * P_temp - beta * Q_temp) 

     e = 0 
     for i in xrange(len(R)): 
      for j in xrange(len(R[i])): 
       if R[i][j] > 0: 
        e = e + pow(R[i][j] - numpy.dot(P[i,:],Q[:,j]), 2) 
        for k in xrange(K): 
         e = e + (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2)) 
     if e < 0.001: 
      break 
    return P, Q.T 

if __name__ == '__main__': 
    R = [ 
     [5,3,0,1], 
     [4,0,0,1], 
     [1,1,0,5], 
     [1,0,0,4], 
     [0,1,5,4], 
     ] 

    R = numpy.array(R) 

    N = len(R) 
    M = len(R[0]) 
    K = 2 

    P = numpy.random.rand(N,K) 
    Q = numpy.random.rand(M,K) 

    nP, nQ = matrix_factorization(R, P, Q, K) 
    nR = numpy.dot(nP, nQ.T) 

此代码没事。所以我写下面的代码:

import random 

def matrix_factorization(R, P, Q, K,steps=5000, alpha=0.0002, beta=0.02): 
    for step in xrange(steps): 
     print 'step',step 
     step += 1 
     for i in R.keys(): 
      for j in R[i].keys(): 
       eij = R[i][j] - sum([x * y for x in P[i] for y in Q[j]]) 
       for k in xrange(K): 
        P_temp = P[i][k] 
        Q_temp = Q[j][k] 

        P[i][k] = P_temp + alpha * (2 * eij * Q_temp - beta * P_temp) 
        Q[k][j] = Q_temp + alpha * (2 * eij * P_temp - beta * Q_temp) 


     e = 0 
     for i in R.keys(): 
      for j in R[i].keys(): 
       e += pow(R[i][j] - sum([x * y for x in P[i] for y in Q[j]]), 2) 
       for k in xrange(K): 
        e += (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2)) 

     if e < 0.001: 
      break 
    return P,Q 


if __name__ == '__main__':  
    R = {0:{0:5,1:3,3:1}, 
     1:{0:4,3:1}, 
     2:{0:1,1:1,3:5}, 
     3:{0:1,3:4}, 
     4:{1:1,2:5,3:4} 
     } 

    N = len(R.keys()) 
    M = 4 
    K = 4 

    P = dict() 
    Q = dict() 

    for i in xrange(N): 
     P[i] = [random.random() for x in xrange(K)] 

    for j in xrange(M): 
     Q[j] = [random.random() for x in xrange(K)] 

    P,Q = matrix_factorization(R,P,Q,K) 
    Rij = dict() 

这两个部分应该有相同的功能,结构也是一样的。 !但是我的代码返回的是:

OverflowError: (34, 'Result too large') 

或计算P上之后和Q表示:

P 
Out[5]: 
{0: [nan, nan, nan, nan], 
1: [nan, nan, nan, nan], 
2: [nan, nan, nan, nan], 
3: [nan, nan, nan, nan], 
4: [nan, nan, nan, nan]} 

Q 
Out[6]: 
{0: [nan, nan, nan, nan], 
1: [nan, nan, nan, nan], 
2: [nan, nan, nan, nan], 
3: [nan, nan, nan, nan]} 

我实在不明白它为什么和非常可悲的事实是我已经完成我的建议系统使用这种方法。 你能帮我找到发生这种情况的原因吗? 非常感谢您的时间!

回答

0

我改变下面的行中的功能matrix_factorization Q [k]的[j]的= Q_temp +阿尔法*(2 * EIJ * P_temp - β* Q_temp) Q [Ĵ ] [k] = Q_temp + alpha *(2 * eij * P_temp - beta * Q_temp) 然后修改后的代码似乎运作良好。

我修改了功能matrix_factorization如下, 然后结果似乎是正确的。

def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02): 
    for step in xrange(steps): 
     for i in R.keys(): 
      for j in R[i].keys(): 
       eij = R[i][j] - sum([P[i][k] * Q[j][k] for k in xrange(K)]) 
       for k in xrange(K): 
        P_temp = P[i][k] 
        Q_temp = Q[j][k] 

        P[i][k] = P_temp + alpha * (2 * eij * Q_temp - beta * P_temp) 
        Q[j][k] = Q_temp + alpha * (2 * eij * P_temp - beta * Q_temp) 

     e = 0 
     for i in R.keys(): 
      for j in R[i].keys(): 
       e += pow(R[i][j] - sum([P[i][k] * Q[j][k] for k in xrange(K)]), 2) 
       for k in xrange(K): 
        e += (beta/2) * (pow(P[i][k], 2) + pow(Q[j][k], 2)) 
     if e < 0.001: 
      break 
    return P,Q 
+0

感谢您的帮助,这确实是一个粗心的错误。而我解决后,结果似乎仍然不正确。 –

+0

哦,对不起!我更新了答案;-) – seikichi

+0

你真好,现在一切都好 –