DBSCAN评价 - 需要true_labels

-1

我想以此为榜样，我自己的一些数据：http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py DBSCAN评价 - 需要true_labels

我有麻烦搞清楚如何让我的“labels_true”变量作为DBSCAN预测评估的一部分。

这里是首先需要行吧：

print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))

我有纬度&经度列，这我使用的数据如下：

coords = X_train.as_matrix(columns=['latitude', 'longitude']) 

kms_per_radian = 6371.0088 
epsilon = 1.5/kms_per_radian 
db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(coords)) 
cluster_labels = db.labels_ 
num_clusters = len(set(cluster_labels)) 
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)]) 
print num_clusters 
#get returned 60

和

print("Homogeneity: %0.3f" % metrics.homogeneity_score(coords, cluster_labels))

是不适合我的线路。

X_train.head（）：

bathrooms bedrooms building_id  description  features interest_level latitude longitude manager_id price 
10 1.5  3.0  53a5b119ba8f7b61d4e010512e0dfc85 A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ... [] medium 40.7145  -73.9425 5ba989232d0489da1b5f2c45f6688adc 3000.0 
10000 1.0  2.0  c5c8a357cba207596b04d1afd1e4f130  [Doorman, Elevator, Fitness Center, Cats Allow... low  40.7947  -73.9667 7533621a882f71e25173b27e3139d83d 5465.0 
100004 1.0  1.0  c3ba40552e2120b0acfc3cb5730bb2aa Top Top West Village location, beautiful Pre-w... [Laundry In Building, Dishwasher, Hardwood Flo... high 40.7388  -74.0018 d9039c43983f6e564b1482b273bd7b01 2850.0 
100007 1.0  1.0  28d9ad350afeaab8027513a3e52ac8d5 Building Amenities - Garage - Garden - fitness... [Hardwood Floors, No Fee] low  40.7539  -73.9677 1067e078446a7897d2da493d2f741316 3275.0 
100013 1.0  4.0  0 Beautifully renovated 3 bedroom flex 4 bedroom... [Pre-War] low  40.8241  -73.9493 98e13ad4b495b9613cef886d79a6291f 3350.0

据我所知，db.labels_是每个点所属太预测簇＃。我想返回一个新的coords数组，其中包含预测的60个集群标签，另一个用于具有真实60个集群标签的度量标准，而不是每个点的旧纬度/经度。

来源

2017-08-13 Frederic Bastiat

请参阅[本页]（http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation）并查找不需要地面实况数据的指标。 –