2012-10-21 46 views
3

我一直在过去的一个月左右(我是学生)一直在学习Weka API。我正在做的是编写一个程序来过滤一组特定的数据,并最终为它构建一个贝叶斯网络,一周前我完成了我的离散化类和属性选择类。就在数天前,我意识到我需要改变我的离散函数来监督和最终使用默认法耶兹&伊拉尼方法,我这样做后,我开始在我的属性选择类来得到这个错误:Weka:在属性选择期间受监督的离散度问题和错误“没有足够的训练实例”

Exception in thread "main" weka.core.WekaException: 
weka.attributeSelection.CfsSubsetEval: Not enough training instances with class labels (required: 1, provided: 0)! 
at weka.core.Capabilities.test(Capabilities.java:1138) 
at weka.core.Capabilities.test(Capabilities.java:1023) 
at weka.core.Capabilities.testWithFail(Capabilities.java:1302) 
at weka.attributeSelection.CfsSubsetEval.buildEvaluator(CfsSubsetEval.java:331) 
at weka.attributeSelection.AttributeSelection.SelectAttributes(AttributeSelection.java:597) 
at weka.filters.supervised.attribute.AttributeSelection.batchFinished(AttributeSelection.java:456) 
at weka.filters.Filter.useFilter(Filter.java:663) 
at AttributeSelectionFilter.selectionFilter(AttributeSelectionFilter.java:29) 
at Runner.main(Runner.java:70) 

我在修改之前的属性选择工作得很好,所以我认为我可能在我的离散类中做了错误的事情。我的这个问题的另一部分与此有关,因为我也注意到我的离散课程似乎并没有真正地使数据离散化;它只是把所有的数字数据放到一个范围内,而不是象伊朗应该的Fayyad &那样战略性地分类。

这里是我的离散化类:

import weka.core.Instances; 
import weka.filters.Filter; 
import weka.filters.supervised.attribute.Discretize; 
import weka.filters.unsupervised.attribute.NumericToNominal; 

public class DiscretizeFilter 
{ 
    private Instances data; 
    private boolean sensitiveOption; 
    private Filter filter = new Discretize(); 

    public DiscretizeFilter(Instances data, boolean sensitiveOption) 
    { 
     this.data = data; 
     this.sensitiveOption = sensitiveOption; 
    } 

    public Instances discreteFilter() throws Exception 
    { 
     NumericToNominal nm = new NumericToNominal(); 
     nm.setInputFormat(data); 
     Filter.useFilter(data, nm); 
     Instances nominalData = nm.getOutputFormat(); 

     if(sensitiveOption)//if the user wants extra sensitivity 
     { 
      String options[] = new String[1]; 
      options[0] = options[0]; 
      options[2] = "-E"; 
      ((Discretize) filter).setOptions(options); 
     } 
     filter.setInputFormat(nominalData); 
     Filter.useFilter(nominalData,filter); 
     return filter.getOutputFormat(); 
    } 
} 

这里是我的属性选择类:

import weka.attributeSelection.BestFirst; 
import weka.attributeSelection.CfsSubsetEval; 
import weka.core.Instances; 
import weka.filters.supervised.attribute.AttributeSelection; 

public class AttributeSelectionFilter 
{ 
    public Instances selectionFilter(Instances data) throws Exception 
    { 
     AttributeSelection filter = new AttributeSelection(); 

     for(int i = 0; i < data.numInstances(); i++) 
     { 
      filter.input(data.instance(i)); 
     } 
     CfsSubsetEval eval = new CfsSubsetEval(); 
     BestFirst search = new BestFirst(); 
     filter.setSearch(search); 
     filter.setEvaluator(eval); 

     filter.setInputFormat(data); 
     AttributeSelection.useFilter(data, filter); 

     return filter.getOutputFormat(); 
    } 

    public int attributeCounter(Instances data) 
    { 
     return data.numAttributes(); 
    } 
} 

任何帮助将不胜感激!

回答

0

内部Weka将属性值存储为双精度值。由于数据集中的每个实例(data)都是“缺少类”,即出于任何原因被赋予了内部类属性值NaN(“不是数字”),所以似乎抛出异常。我建议仔细检查data's类属性是否正确创建/设置。

0

我想通了,这是我误解Discretize类中方法“outputFormat()”的描述的错误。我相反​​从useFilter()获得了过滤的实例,并解决了我的问题!我只是给属性选择过滤器提供了错误的数据类型。

相关问题