2014-09-21 116 views
1

我有一个由R集群情节,而我想优化与wss情节聚类的“肘标准”,但我不知道如何绘制给定集群的wss情节,任何人都可以帮助我?如何绘制群集内群集平方和的图形?

这里是我的数据:

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096) 
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1) 
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029) 
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067) 
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188) 
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108) 
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046) 
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088) 
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025) 
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1) 
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146) 
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442) 
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717) 

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral) 

这里是我的群集的代码:

cor <- cor (data) 
dist<-dist(cor) 
hclust<-hclust(dist) 
plot(hclust) 

并运行上面的代码后,我会得到一个树状图,而我怎么可以得出一个阴谋像这样:

enter image description here

回答

6

如果我按照你想要的是什么n我们将需要一个函数来计算WSS

wss <- function(d) { 
    sum(scale(d, scale = FALSE)^2) 
} 

和包装这个wss()功能

wrap <- function(i, hc, x) { 
    cl <- cutree(hc, i) 
    spl <- split(x, cl) 
    wss <- sum(sapply(spl, wss)) 
    wss 
} 

此包装采用下列参数,输入:

  • i集群与数将数据剪切成
  • hc层次聚类分析对象
  • x原始数据

wrap然后切断树形图分割至i簇,原始数据分裂成由cl给出的集群成员资格,并计算每个群集的WSS。将这些WSS值相加以给出该群集的WSS。

我们经营这一切使用sapply在群集1,2号,...,nrow(data)

res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data) 

一个screeplot可以使用

plot(seq_along(res), res, type = "b", pch = 19) 

下面是一个例子使用绘制着名的埃德加安德森虹膜数据集:

iris2 <- iris[, 1:4] # drop Species column 
cl <- hclust(dist(iris2), method = "ward.D") 

## Takes a little while as we evaluate all implied clustering up to 150 groups 
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2) 
plot(seq_along(res), res, type = "b", pch = 19) 

这给出:

enter image description here

我们可以通过只表示第一集群1:50

plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19) 

这给

enter image description here

可以加快由两种主要的计算步骤放大通过适当的并行替代方案运行sapply(),或者只需少量计算即可比例如nrow(data)簇。

res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups 
+0

谢谢!但为什么y轴上的数值非常大,而我的数据确实非常小?另外,你能回答我关于wss-plot的另一个问题吗?:https://stackoverflow.com/questions/25977798/why-is-the-line-of-wss-plot-for-optimize-the- cluster-analysis-looks-so-volaua – 2014-09-22 15:30:00

+0

y轴上的值由数据中变量的比例决定。我会看看另一个Q. – 2014-09-22 16:08:24