1
我有使用WEKA.jar创建k-means算法的函数。我已经完成了创建功能并在我的控制台中显示对象列表。但是,我想显示来自k-means聚类的特定属性。从KMeans聚类获取数据库属性WEKA
这是我的语法结果:
//importing required dependencies
import weka.core.Instance;
import weka.experiment.InstanceQuery;
public class KMeans {
/*get connection strings from database manager*/
private DatabaseManager datman = new DatabaseManager();
private String username = datman.getUsername(); //get username
private String password = datman.getPassword(); //get password
public void doProcess(){
int n = 3;
String queries = "SELECT idms_kodebarang, aksesoris, bahan, `QTY-SA-1`,`QTY-SA-2`,`QTY-SA-3`,`QTY-SA-4`,`harga` FROM mt_karakterproduk";
try {
InstanceQuery query = new InstanceQuery();
File reader = new File("DatabaseUtils.props");
query.setUsername(username);
query.setPassword(password);
query.setQuery(queries);
query.initialize(reader);
query.setSparseData(true);
Instances Data = query.retrieveInstances();
String[] options = weka.core.Utils.splitOptions("-I 100");
SimpleKMeans kmeans = new SimpleKMeans();
kmeans.setSeed(10);
kmeans.setOptions(options);
//this is the important parameter to set
kmeans.setNumClusters(n);
kmeans.setPreserveInstancesOrder(true);
kmeans.buildClusterer(Data);
EuclideanDistance Dist = (EuclideanDistance)kmeans.getDistanceFunction();
Instances instances = kmeans.getClusterCentroids();
//create cluster information print result
ClusterEvaluation eval = new ClusterEvaluation();
eval.setClusterer(kmeans);
for (int i = 0; i < instances.numInstances(); i++) {
// for each cluster center
Instance inst = instances.instance(i);
Double dist1 = Dist.distance(instances.firstInstance(), Data.instance(i));
// as you mentioned, you only had 1 attribute
// but you can iterate through the different attributes
double value = inst.value(0);
java.lang.System.out.println("Value for centroid " + i + ": " + value + " ::: " +dist1);
}
java.lang.System.out.printf("Cluster Results \n =================== \n "+eval.clusterResultsToString());
//this array returns the cluster number for each instance
//the array has as many elements as the number of instances
int[] assignments = kmeans.getAssignments();
int i = 0;
for(int clusternum : assignments){
java.lang.System.out.printf("Instance %d - > cluster %d \n", i, clusternum);
i++;
}
} catch (Exception e) {
java.lang.System.out.println("Error On KMeans Analysis Exception : " + e.toString());
}
}
}
结果只显示列表是这样的:
- INFO:实例0 - >簇2
- INFO:实例2 - >簇2
- 信息:实例4 - >簇1
- INFO:实例6 - >簇2
- INFO:实例8 - >簇2
- INFO:实例10 - >簇1
- INFO:实例12 - >簇2
- INFO:实例14 - >簇0
- INFO:实例16 - >簇1
- INFO:实例18 - >簇1
- INFO:实例20 - >簇1
- INFO:实例22 - >簇1
- INFO:实例24 - >簇0
- INFO:实例26 - >簇0
- INFO:实例28 - >簇1
- INFO:实例30 - >簇1 ...等。
我需要得到的结果不仅是实例字符串,而是从数据库的特定属性。所以结果是这样的(在我的weka应用程序中)
Cluster centroids:
Cluster#
Attribute Full Data 0 1 2
(32) (8) (15) (9)
=============================================================================
idms_kodebarang E501245FF3 E613104F E501247FF3 E501245FF3
E501245FF3 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E501247FF3 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E820707F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E820705F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E5016B57FF 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B59FF 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E820701F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E613104F 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
E820708F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E521210F6 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5216B10F6 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E501245C$3KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E501247C$3KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E5FF3 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E701601F 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
E613105F 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
E600201FC 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E600105C 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E620201C 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B57C$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E620501H 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B59C$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 (11%)
E800601F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E880201H 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
E931301F 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
G932201F$ 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
E840104FC 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
E600300F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E701104F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B50FF 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E702201F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E502415H6 1 ( 3%) 1 (12%) 0 ( 0%) 0 ( 0%)
如何实现此目的?
在此先感谢。
thanx非常感谢您帮助我的答案^ _^...但我无法再问另一个问题,我是否可以根据特定属性打印每个群集的内容,例如,名称?或者这只是显示数据的唯一方式 –