2015-10-28 45 views
2

我的代码是使用L-BFGS优化实现主动学习算法。我想优化四个参数:alpha,beta,wgamma发生Python scipy.optimize.fmin_l_bfgs_b错误

然而,当我运行下面的代码,我得到了一个错误:

optimLogitLBFGS = sp.optimize.fmin_l_bfgs_b(func, x0 = x0, args = (X,Y,Z), fprime = func_grad)           
    File "C:\Python27\lib\site-packages\scipy\optimize\lbfgsb.py", line 188, in fmin_l_bfgs_b 
    **opts) 
    File "C:\Python27\lib\site-packages\scipy\optimize\lbfgsb.py", line 311, in _minimize_lbfgsb 
    isave, dsave) 
    _lbfgsb.error: failed in converting 7th argument ``g' of _lbfgsb.setulb to C/Fortran array 
    0-th dimension must be fixed to 22 but got 4 

我的代码是:

# -*- coding: utf-8 -*- 
import numpy as np 
import scipy as sp 
import scipy.stats as sps 

num_labeler = 3 
num_instance = 5 

X = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4],[5,5,5,5]]) 
Z = np.array([1,0,1,0,1]) 
Y = np.array([[1,0,1],[0,1,0],[0,0,0],[1,1,1],[1,0,0]]) 

W = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3]]) 
gamma = np.array([1,1,1,1,1]) 
alpha = np.array([1,1,1,1]) 
beta = 1 
para = np.array([1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,1]) 

def get_params(para): 
    # extract parameters from 1D parameter vector 
    assert len(para) == 22 
    alpha = para[0:4] 
    beta = para[4] 
    W = para[5:17].reshape(3, 4) 
    gamma = para[17:] 
    return alpha, beta, gamma, W 

def log_p_y_xz(yit,zi,sigmati): #log P(y_it|x_i,z_i) 
    return np.log(sps.norm(zi,sigmati).pdf(yit))#tested 

def log_p_z_x(alpha,beta,xi): #log P(z_i=1|x_i) 
    return -np.log(1+np.exp(-np.dot(alpha,xi)-beta))#tested 

def sigma_eta_ti(xi, w_t, gamma_t): # 1+exp(-w_t x_i -gamma_t)^-1 
    return 1/(1+np.exp(-np.dot(xi,w_t)-gamma_t)) #tested 

def df_alpha(X,Y,Z,W,alpha,beta,gamma):#df/dalpha 
    return np.sum((2/(1+np.exp(-np.dot(alpha,X[i])-beta))-1)*np.exp(-np.dot(alpha,X[i])-beta)*X[i]/(1+np.exp(-np.dot(alpha,X[i])-beta))**2 for i in range (num_instance)) 
    #tested 
def df_beta(X,Y,Z,W,alpha,beta,gamma):#df/dbelta 
    return np.sum((2/(1+np.exp(-np.dot(alpha,X[i])-beta))-1)*np.exp(-np.dot(alpha,X[i])-beta)/(1+np.exp(-np.dot(alpha,X[i])-beta))**2 for i in range (num_instance)) 

def df_w(X,Y,Z,W,alpha,beta,gamma):#df/sigma * sigma/dw 
    return np.sum(np.sum((-3)*(Y[i][t]**2-(-np.log(1+np.exp(-np.dot(alpha,X[i])-beta)))*(2*Y[i][t]-1))*(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**4)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))*X[i]+(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**2)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))*X[i]for t in range(num_labeler)) for i in range (num_instance)) 

def df_gamma(X,Y,Z,W,alpha,beta,gamma):#df/sigma * sigma/dgamma 
    return np.sum(np.sum((-3)*(Y[i][t]**2-(-np.log(1+np.exp(-np.dot(alpha,X[i])-beta)))*(2*Y[i][t]-1))*(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**4)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))+(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**2)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))for t in range(num_labeler)) for i in range (num_instance)) 

def func(para, *args): 
    alpha, beta, gamma, W = get_params(para) 
    #args 
    X = args [0] 
    Y = args[1] 
    Z = args[2]   
    return np.sum(np.sum(log_p_y_xz(Y[i][t], Z[i], sigma_eta_ti(X[i],W[t],gamma[t]))+log_p_z_x(alpha, beta, X[i]) for t in range(num_labeler)) for i in range (num_instance)) 
    #tested 

def func_grad(para, *args): 
    alpha, beta, gamma, W = get_params(para) 
    #args 
    X = args [0] 
    Y = args[1] 
    Z = args[2] 
    #gradiants 
    d_f_a = df_alpha(X,Y,Z,W,alpha,beta,gamma) 
    d_f_b = df_beta(X,Y,Z,W,alpha,beta,gamma) 
    d_f_w = df_w(X,Y,Z,W,alpha,beta,gamma) 
    d_f_g = df_gamma(X,Y,Z,W,alpha,beta,gamma) 
    return np.array([d_f_a, d_f_b,d_f_w,d_f_g]) 

x0 = np.concatenate([np.ravel(alpha), np.ravel(beta), np.ravel(W), np.ravel(gamma)]) 

optimLogitLBFGS = sp.optimize.fmin_l_bfgs_b(func, x0 = x0, args = (X,Y,Z), fprime = func_grad) 

我不知道是什么问题。也许,func_grad会导致问题?任何人都可以看看吗?感谢

+1

[相关问题](http://stackoverflow.com/questions/33383895/to-optimize-four-parameters-in-python-scipy-optimize-fmin-l-bfgs-b-with-an-erro/ ) – jakevdp

回答

4

需要首先考虑的func的衍生物相对于每个在你的alpha, beta, w, gamma参数级联阵列元件,那么应func_grad返回相同的长度的单一一维数组作为x0(即22)。相反,它返回嵌套在np.object阵列内的两个阵列和两个标量漂浮的混乱:

问题的
In [1]: func_grad(x0, X, Y, Z) 
Out[1]: 
array([array([ 0.00681272, 0.00681272, 0.00681272, 0.00681272]), 
     0.006684719133999417, 
     array([-0.01351227, -0.01351227, -0.01351227, -0.01351227]), 
     -0.013639910534587798], dtype=object) 

部分是np.array([d_f_a, d_f_b,d_f_w,d_f_g])不级联这些对象到一个单一的一维数组,因为有些是numpy的阵列和一些是Python漂浮。取而代之,通过使用np.hstack([d_f_a, d_f_b,d_f_w,d_f_g])可以轻松解决该问题。

但是,这些对象的组合大小仍然只有10,而func_grad的输出需要是22长的向量。您需要再看看df_*功能。特别地,W(3, 4)阵列,但是df_w仅返回(4,)向量,并且gamma(4,)向量,而df_gamma仅返回标量。

+0

非常感谢您的回答。我在原始论文中检查了算法中的公式,例如,'df_gamma'是一个标量。所以我不知道为什么那篇论文的作者成功地对算法进行了编码。另一个值得关注的是,当我将代码的最后一行改为'optimBFGS = sp.optimize.minimize(func,x0 = x0,args =(X,Y,Z))'(没有func_grad)时,我可以得到结果。这有点奇怪。 – flyingmouse

+0

这是预期的。如果你不通过梯度函数来“最小化”,那么它将尝试使用一阶有限差分来近似它。这通常效率较低,数值稳定,但它仍然可以让你得到答案。 –

+0

“gamma”是一个“(4)”向量,或者“df_gamma”是一个标量 - 它们都是没有意义的。 –