2009-12-25 29 views
6

我有两个数据集群,每个集群都有x,y(坐标)和一个知道它的类型的值(1 class1,2 class 2)。我绘制了这些数据,但是我会喜欢用边界(视觉)分割这些类。做这样的事情有什么功能。我尝试了轮廓,但没有帮助!在matlab中将数据可视化地分为两类

回答

11

考虑这个classification问题(使用Iris dataset):

points scatter plot

正如你所看到的,除了用于您知道边界的方程事先容易分离集群,寻找边界不是简单的任务...

一个想法是使用discriminant analysis功能classify找到边界(你有线性和二次边界之间进行选择)。

以下是说明该过程的完整示例。该代码需要统计工具箱:

%# load Iris dataset (make it binary-class with 2 features) 
load fisheriris 
data = meas(:,1:2); 
labels = species; 
labels(~strcmp(labels,'versicolor')) = {'non-versicolor'}; 

NUM_K = numel(unique(labels));  %# number of classes 
numInst = size(data,1);    %# number of instances 

%# visualize data 
figure(1) 
gscatter(data(:,1), data(:,2), labels, 'rb', '*o', ... 
    10, 'on', 'sepal length', 'sepal width') 
title('Iris dataset'), box on, axis tight 

%# params 
classifierType = 'quadratic';  %# 'quadratic', 'linear' 
npoints = 100; 
clrLite = [1 0.6 0.6 ; 0.6 1 0.6 ; 0.6 0.6 1]; 
clrDark = [0.7 0 0 ; 0 0.7 0 ; 0 0 0.7]; 

%# discriminant analysis 
%# classify the grid space of these two dimensions 
mn = min(data); mx = max(data); 
[X,Y] = meshgrid(linspace(mn(1),mx(1),npoints) , linspace(mn(2),mx(2),npoints)); 
X = X(:); Y = Y(:); 
[C,err,P,logp,coeff] = classify([X Y], data, labels, classifierType); 

%# find incorrectly classified training data 
[CPred,err] = classify(data, data, labels, classifierType); 
bad = ~strcmp(CPred,labels); 

%# plot grid classification color-coded 
figure(2), hold on 
image(X, Y, reshape(grp2idx(C),npoints,npoints)) 
axis xy, colormap(clrLite) 

%# plot data points (correctly and incorrectly classified) 
gscatter(data(:,1), data(:,2), labels, clrDark, '.', 20, 'on'); 

%# mark incorrectly classified data 
plot(data(bad,1), data(bad,2), 'kx', 'MarkerSize',10) 
axis([mn(1) mx(1) mn(2) mx(2)]) 

%# draw decision boundaries between pairs of clusters 
for i=1:NUM_K 
    for j=i+1:NUM_K 
     if strcmp(coeff(i,j).type, 'quadratic') 
      K = coeff(i,j).const; 
      L = coeff(i,j).linear; 
      Q = coeff(i,j).quadratic; 
      f = sprintf('0 = %g + %g*x + %g*y + %g*x^2 + %g*x.*y + %g*y.^2',... 
       K,L,Q(1,1),Q(1,2)+Q(2,1),Q(2,2)); 
     else 
      K = coeff(i,j).const; 
      L = coeff(i,j).linear; 
      f = sprintf('0 = %g + %g*x + %g*y', K,L(1),L(2)); 
     end 
     h2 = ezplot(f, [mn(1) mx(1) mn(2) mx(2)]); 
     set(h2, 'Color','k', 'LineWidth',2) 
    end 
end 

xlabel('sepal length'), ylabel('sepal width') 
title(sprintf('accuracy = %.2f%%', 100*(1-sum(bad)/numInst))) 

hold off 

classification boundaries with quadratic discriminant function

+3

+1 ....漂亮! – Jacob 2009-12-26 03:54:26

+0

@Amro - 这只是我,还是第二个截图失踪? – Shai 2012-12-16 10:43:30

+1

@Shai:不只是你,有时上传到ima​​geshack的旧图片往往会因为某种原因而消失......反正我用新鲜的图片更新了这个例子:) – Amro 2012-12-16 14:57:26