我想分类使用Weka未标记的字符串,我不是数据挖掘方面的专家,所以我一直在努力与不同的条款。我在做什么是我所提供的训练数据和设置运行M5Rules分类后无标签的字符串,我真的开始输出,但我不知道是什么意思:Weka分类和预测类
run:
{17 1,35 1,64 1,135 1,205 1,214 1,215 1,284 1,288 1,309 1,343 1,461 1,493 1,500 1,552 1,806 -0.038168} | -0.03816793850062397
-0.03816793850062397 ->
Results
======
Correlation coefficient 0
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 1
BUILD SUCCESSFUL (total time: 1 second)
的源代码如下:
public Categorizer(){
try{
//*** READ ARRF FILES *///////////////////////////////////////////////////////
//BufferedReader trainReader = new BufferedReader(new FileReader("c:/Users/Yehia A.Salam/Desktop/dd/training-data.arff"));//File with text examples
//BufferedReader classifyReader = new BufferedReader(new FileReader("c:/Users/Yehia A.Salam/Desktop/dd/test-data.arff"));//File with text to classify
// Create trainning data instance
TextDirectoryLoader loader = new TextDirectoryLoader();
loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/training-data"));
Instances dataRaw = loader.getDataSet();
StringToWordVector filter = new StringToWordVector();
filter.setInputFormat(dataRaw);
Instances dataTraining = Filter.useFilter(dataRaw, filter);
dataTraining.setClassIndex(dataRaw.numAttributes() - 1);
// Create test data instances
loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data"));
dataRaw = loader.getDataSet();
Instances dataTest = Filter.useFilter(dataRaw, filter);
dataTest.setClassIndex(dataTest.numAttributes() - 1);
// Classify
FilteredClassifier model = new FilteredClassifier();
model.setFilter(new StringToWordVector());
model.setClassifier(new M5Rules());
model.buildClassifier(dataTraining);
for (int i = 0; i < dataTest.numInstances(); i++) {
dataTest.instance(i).setClassMissing();
double cls = model.classifyInstance(dataTest.instance(i));
dataTest.instance(i).setClassValue(cls);
System.out.println(dataTest.instance(i).toString() + " | " + cls);
System.out.println(cls + " -> " + dataTest.instance(i).classAttribute().value((int) cls));
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(dataTraining);
eval.evaluateModelOnce(cls, dataTest.instance(i));
System.out.println(eval.toSummaryString("\nResults\n======\n", false));
}
}
catch(FileNotFoundException e){
System.err.println(e.getMessage());
}
catch(IOException i){
System.err.println(i.getMessage());
}
catch(Exception o){
System.err.println(o.getMessage());
}
}
最后一对夫妇的情况下,屏幕截图的我做了什么错在文件夹层次结构:
我想帮助你,但缺少重要信息。你的数据到底是什么样的?什么是功能? (我只是猜测字符串。)可能的标签是什么?你的输出是暗示你只有一个实例。我也认为你不应该从测试数据中删除类标签,因为如果没有它,你将无法评估分类器的性能。 – Sentry 2013-03-08 15:44:59
@Sentry基本上,训练和测试数据都是描述交易的两三行句子,例如, travel-1.txt的内容为“购买我们的LE 1100电子优惠券(原始价值LE 1850),并在壮观的MinaMark度假村的HB基础上的双人标准房中获得3晚住宿,所以我认为这是唯一的功能。现在可能的标签是[旅行或健康]。我只是试图预测一笔交易,因此我只尝试一个实例。 – 2013-03-08 16:37:03