2017-08-07 40 views
0

我想实现像这样的高斯NB训练。然而,如果X的尺寸不相等(即X内的所有列表需要长度相同),则gnb.fit()会引发异常。如果我的训练样本是不同长度的向量,调用fit()的正确方法是什么?高斯NB拟合()函数期望固定长度向量

def train(X, Y): 
    gnb = GaussianNB() 
    gnb.fit(X, Y) 
    return gnb 

>>> X = [[1,2,3], [4,5,6,7], [8,9]] 
>>> Y = [1,1,1] 
>>> snb.train(X, Y) 

/Library/Python/2.7/site-packages/sklearn/utils/validation.py:395: 
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 
and will raise ValueError in 0.19. Reshape your data either using 
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) 
if it contains a single sample. 
DeprecationWarning) 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "snb.py", line 113, in train 
gnb.fit(X, Y) 
File "/Library/Python/2.7/site-packages/sklearn/naive_bayes.py", line 
182, in fit 
X, y = check_X_y(X, y) 
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", 
line 521, in check_X_y 
ensure_min_features, warn_on_dtype, estimator) 
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", 
line 402, in check_array 
array = array.astype(np.float64) 
ValueError: setting an array element with a sequence. 

回答

0

这是因为列表X内的列表是相同的长度不能。子列表X充当行/示例,并且该列表中的每个元素都是一个特征。为了确保你的模型运行,你需要有相同长度的子列表,否则它将不起作用。我改变了这部分,代码工作。

def train(X, Y): 
    gnb = GaussianNB() 
    gnb.fit(X, Y) 
    return gnb 

X = [[1,2,3,4], [4,5,6,7], [8,9,10,11]] 
Y = [1,1,1] 
train(X, Y) 
2

你所有的X向量MUST长度相同。高斯朴素贝叶斯估计器被设计为基于一组因素进行预测。如果每个X中有一个可变数字,分类器如何确定哪个元素属于哪个因子?

一种选择是填充X值为0的矢量,以确保它们的长度都相等。否则,你需要考虑可变的预处理。