Sklearn：找到簇的平均质心位置？

import pandas as pd, numpy as np, scipy 
import sklearn.feature_extraction.text as text 
from sklearn import decomposition 

descs = ["You should not go there", "We may go home later", "Why should we do your chores", "What should we do"] 

vectorizer = text.CountVectorizer() 

dtm = vectorizer.fit_transform(descs).toarray() 

vocab = np.array(vectorizer.get_feature_names()) 

nmf = decomposition.NMF(3, random_state = 1) 

topic = nmf.fit_transform(dtm)

印刷topic给我留下了：Sklearn：找到簇的平均质心位置？

>>> print(topic) 
[0.  , 1.403 , 0.  ], 
[0.  , 0.  , 1.637 ], 
[1.257 , 0.  , 0.  ], 
[0.874 , 0.056 , 0.065 ]

这是在descs的可能性每个元素的矢量属于某个簇。我怎样才能得到每个群集质心的坐标？最终，我想开发一个函数来计算descs中每个元素与其分配给的簇的质心之间的距离。

是否最好只计算每个群集的每个元素值的平均值？topic？

来源

2016-07-27 blacksite

的docs的sklearn.decomposition.NMF解释如何得到每个群集的质心的坐标：

属性： components_：阵列，[n_components，n_features]
数据非负分量。

基向量排列逐行，如下面的交互式会话：

In [995]: np.set_printoptions(precision=2) 

In [996]: nmf.components_ 
Out[996]: 
array([[ 0.54, 0.91, 0. , 0. , 0. , 0. , 0. , 0.89, 0. , 0.89, 0.37, 0.54, 0. , 0.54], 
     [ 0. , 0.01, 0.71, 0. , 0. , 0. , 0.71, 0.72, 0.71, 0.01, 0.02, 0. , 0.71, 0. ], 
     [ 0. , 0.01, 0.61, 0.61, 0.61, 0.61, 0. , 0. , 0. , 0.62, 0.02, 0. , 0. , 0. ]])

关于你的第二个问题，我没有看到的“计算的平均点每个descs元素的每个群集的主题值“。在我看来，通过计算的可能性进行分类更有意义。

来源

2016-07-28 02:16:27 Tonechas

我假设你创建了三个质心。 “nmf.components_”中每个元素表示的每个质心的坐标如何？该数组中非零元素的数量似乎表示高维度。 – blacksite

nmf.components_'的尺寸是3行乘14列，它们对应于3个簇和14个不同的单词，即表示簇质心的向量是词汇基础的线性组合。 – Tonechas

那么我怎么能找到质心本身的x-y坐标？或者这是一个误导的问题？ – blacksite

Sklearn：找到簇的平均质心位置？

回答

相关问题