PAM集群 - 在另一个数据集中使用结果

我已经使用函数（R中的集群包）成功运行了围绕Medoids的分区，现在，我想要使用结果将新观察归因于先前定义的簇/冥界。PAM集群 - 在另一个数据集中使用结果

另一种方法把问题是，鉴于ķ簇/中心点划分已发现由PAM功能，这是更接近一个附加的观察，这不是在初始数据集？

x<-matrix(c(1,1.2,0.9,2.3,2,1.8, 
      3.2,4,3.1,3.9,3,4.4),6,2) 
x 
    [,1] [,2] 
[1,] 1.0 3.2 
[2,] 1.2 4.0 
[3,] 0.9 3.1 
[4,] 2.3 3.9 
[5,] 2.0 3.0 
[6,] 1.8 4.4 
pam(x,2)

观测1，3和5，和图2，4和6被聚集在一起，并观察1和6是中心点划分：

Medoids: 
    ID   
[1,] 1 1.0 3.2 
[2,] 6 1.8 4.4 
Clustering vector: 
[1] 1 2 1 2 1 2

现在，向其中簇/ medoidý应归功于/有关联？

y<-c(1.5,4.5)

噢，如果你有几个解决方案，我的大数据集中的计算时间很重要。

来源

2016-12-23 EdM

可以计算为y和以往任何时候都距离小于从位数的距离。 Y将属于该群集。 –

您不需要'which.min'和距离计算库。 **只需自己写*一行代码** ** –

一般尝试此k个簇：

k <- 2 # pam with k clusters 
res <- pam(x,k) 

y <- c(1.5,4.5) # new point 

# get the cluster centroid to which the new point is to be assigned to 
# break ties by taking the first medoid in case there are multiple ones 

# non-vectorized function 
get.cluster1 <- function(res, y) which.min(sapply(1:k, function(i) sum((res$medoids[i,]-y)^2))) 

# vectorized function, much faster 
get.cluster2 <- function(res, y) which.min(colSums((t(res$medoids)-y)^2)) 

get.cluster1(res, y) 
#[1] 2 
get.cluster2(res, y) 
#[1] 2 

# comparing the two implementations (the vectorized function takes much les s time) 
library(microbenchmark) 
microbenchmark(get.cluster1(res, y), get.cluster2(res, y)) 

#Unit: microseconds 
#     expr min  lq  mean median  uq  max neval cld 
# get.cluster1(res, y) 31.219 32.075 34.89718 32.930 33.358 135.995 100 b 
# get.cluster2(res, y) 17.107 17.962 19.12527 18.817 19.245 41.483 100 a

扩展到任意距离函数：

# distance function 
euclidean.func <- function(x, y) sqrt(sum((x-y)^2)) 
manhattan.func <- function(x, y) sum(abs(x-y)) 

get.cluster3 <- function(res, y, dist.func=euclidean.func) which.min(sapply(1:k, function(i) dist.func(res$medoids[i,], y))) 
get.cluster3(res, y) # use Euclidean as default 
#[1] 2 
get.cluster3(res, y, manhattan.func) # use Manhattan distance 
#[1] 2

来源

2016-12-23 06:22:24

请注意，此代码仅为欧几里得距离 - 但您不会使用pam与欧几里德距离。 –

@ Anony-Mousse我们可以使用任何距离函数来代替欧几里得。 –

PAM集群 - 在另一个数据集中使用结果

回答

相关问题