最近的质心分类器真的效率低下吗？

我正在阅读Ethem Alpaydin的“机器学习入门”，我遇到了最近的质心分类器并试图实现它。我想我已经正确实施了分类器，但我的准确率只有68％。那么，最近的质心分类器本身效率低下，还是在我的实现中出现了一些错误（如下所示）？最近的质心分类器真的效率低下吗？

该数据集包含含有4个功能和有2个输出类1372个数据点我的MATLAB实现：

DATA = load("-ascii", "data.txt"); 

#DATA is 1372x5 matrix with 762 data points of class 0 and 610 data points of class 1 
#there are 4 features of each data point 
X = DATA(:,1:4); #matrix to store all features 

X0 = DATA(1:762,1:4); #matrix to store the features of class 0 
X1 = DATA(763:1372,1:4); #matrix to store the features of class 1 
X0 = X0(1:610,:); #to make sure both datasets have same size for prior probability to be equal 
Y = DATA(:,5); # to store outputs 

mean0 = sum(X0)/610; #mean of features of class 0 
mean1 = sum(X1)/610; #mean of featurs of class 1 

count = 0; 
for i = 1:1372 
    pre = 0; 
    cost1 = X(i,:)*(mean0'); #calculates the dot product of dataset with mean of features of both classes 
    cost2 = X(i,:)*(mean1'); 

    if (cost1<cost2) 
    pre = 1; 
    end 
    if pre == Y(i) 
    count = count+1; #counts the number of correctly predicted values 
    end 

end 

disp("accuracy"); #calculates the accuracy 
disp((count/1372)*100);

来源

2017-04-23 user7909152

至少有几件事情在这里：

你正在使用点积在输入空间中分配相似度，这几乎是从来没有有效。使用点积的唯一原因是所有数据点都具有相同的规范，或规范无关紧要（几乎从不是真的）。尝试使用欧几里德距离代替，因为即使它非常天真 - 它应该是更好的
这是一个效率低下分类器？取决于效率的定义。这是一个非常简单和快速的，但在预测能力方面，它是非常差。事实上，它比朴素贝叶斯更糟糕，它已被认为是“玩具模型”。
也有一些是错误的代码太
```
X0 = DATA(1:762,1:4); #matrix to store the features of class 0 
X1 = DATA(763:1372,1:4); #matrix to store the features of class 1 
X0 = X0(1:610,:); #to make sure both datasets have same size for prior probability to be equal 
```
一旦你个子样本X0，你有，但后来在“测试”你考的培训和“失踪X0的元素” 1220个训练样本，这从概率的角度来看并没有什么意义。首先，你不应该测试训练集的准确性（因为它高估了真实的准确性），其次，通过对你的训练数据进行二次抽样，你可以得到均等的先验分组。不是像这样的方法，你只是降低质心估计的质量，没有别的。这些技术（子/过采样）均衡了模型的先验，模型的先验。你的方法没有（因为它基本上是假设1/2之前的生成模型），所以没有什么可以发生。

来源

2017-04-23 12:57:13 lejlot

最近的质心分类器真的效率低下吗？

回答

相关问题