如何保存群集种子以获得进一步的评分目的

我正在Python中构建k-means聚类模型。但是，我不确定如何保存群集质心以及如何将它们用于未来的评分目的。当我稍后使用模型时，我总是希望分配相同的群集ID。我会很感激，如果有人有一个明确的代码来演示如何做到这一点。如何保存群集种子以获得进一步的评分目的

更新：

@HannounYassir嗨，肯定，对不起，我应该以前也做过这样的：

想象一下，我的数据集的名字是data_clean和所有的变量都是标准化和前手清洗。

# define the cluster variables 

cluster_vars=data_clean[['A' , 'B' , 'C']] 

# Interpret 4 cluster solution for the data 

model_4=KMeans(n_clusters=4, random_state=30) 
model_4.fit(cluster_vars_copy) 
clusassign=model_4.predict(cluster_vars_copy) 

# Score the customers from last year by using the model created. Imagine my new dataset is clustervars_new 

model_4.fit_predict(clustervars_new) 
clusassign_new=model_4.fit_predict(clustervars_new)

我100％确定我在评分阶段缺少某些东西，因为我没有保存质心种子。因此，它可能会使用相同的模型，但我担心所分配的群集ID将与原始数据集完全相同。

来源

2017-06-12 Cagdas Kanar

你可以发布你所做的任何尝试吗？ –

嗨@HannounYassir，我编辑了我的原始帖子，我的尝试 –

你为什么担心？为什么使用'fit_predict'而不是'predict'？ –

请勿使用fit_predict。

它首先学会了一个新的聚类，然后“预测”。

但是你想要predict使用旧的聚类。

我相信重新使用分类API的fit/predict/fit_predict是一个相当差的sklearn设计决定。对于分类而言，这很方便，但聚类不是分类，大多数聚类算法根本无法“预测”新数据。

来源

2017-06-14 07:09:16

如何保存群集种子以获得进一步的评分目的

回答

相关问题