如何绘制群集内群集平方和的图形？

我有一个由R集群情节，而我想优化与wss情节聚类的“肘标准”，但我不知道如何绘制给定集群的wss情节，任何人都可以帮助我？如何绘制群集内群集平方和的图形？

这里是我的数据：

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096) 
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1) 
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029) 
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067) 
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188) 
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108) 
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046) 
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088) 
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025) 
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1) 
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146) 
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442) 
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717) 

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)

这里是我的群集的代码：

cor <- cor (data) 
dist<-dist(cor) 
hclust<-hclust(dist) 
plot(hclust)

并运行上面的代码后，我会得到一个树状图，而我怎么可以得出一个阴谋像这样：

enter image description here

来源

2014-09-21 Ping Tang

如果我按照你想要的是什么n我们将需要一个函数来计算WSS

wss <- function(d) { 
    sum(scale(d, scale = FALSE)^2) 
}

和包装这个wss()功能

wrap <- function(i, hc, x) { 
    cl <- cutree(hc, i) 
    spl <- split(x, cl) 
    wss <- sum(sapply(spl, wss)) 
    wss 
}

此包装采用下列参数，输入：

i集群与数将数据剪切成
hc层次聚类分析对象
x原始数据

wrap然后切断树形图分割至i簇，原始数据分裂成由cl给出的集群成员资格，并计算每个群集的WSS。将这些WSS值相加以给出该群集的WSS。

我们经营这一切使用sapply在群集1，2号，...，nrow(data)

res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)

一个screeplot可以使用

plot(seq_along(res), res, type = "b", pch = 19)

下面是一个例子使用绘制着名的埃德加安德森虹膜数据集：

iris2 <- iris[, 1:4] # drop Species column 
cl <- hclust(dist(iris2), method = "ward.D") 

## Takes a little while as we evaluate all implied clustering up to 150 groups 
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2) 
plot(seq_along(res), res, type = "b", pch = 19)

这给出：

enter image description here

我们可以通过只表示第一集群1:50

plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)

这给

enter image description here

可以加快由两种主要的计算步骤放大通过适当的并行替代方案运行sapply()，或者只需少量计算即可比例如nrow(data)簇。

res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups

来源

2014-09-21 15:43:36

谢谢！但为什么y轴上的数值非常大，而我的数据确实非常小？另外，你能回答我关于wss-plot的另一个问题吗？：https://stackoverflow.com/questions/25977798/why-is-the-line-of-wss-plot-for-optimize-the- cluster-analysis-looks-so-volaua – 2014-09-22 15:30:00

y轴上的值由数据中变量的比例决定。我会看看另一个Q. – 2014-09-22 16:08:24

如何绘制群集内群集平方和的图形？

回答

相关问题