0
我试图做文本分类中的Weka兼容的错误,但我有一个很大的问题得到测试开始了工作。这是我的训练集(这是短,因为我刚开始学习秧鸡!):培训和测试集不是秧鸡
@relation sentiment
@attribute phrase string
@attribute value {pos, neg}
@data
'That was really unlucky', neg
'The car crashed horribly', neg
'The culpirit got away',neg
'Fortunally everyone made it out', pos
'She was glad noone was hurt',pos
'And the sun was at least shining',pos
我再上一套使用StringToWordVector,然后应用NumericToBinary。这是训练集的最终结果是:
@relation 'sentiment-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"-weka.filters.unsupervised.attribute.NumericToBinary'
@attribute value {pos,neg}
@attribute And_binarized {0,1}
@attribute Fortunally_binarized {0,1}
@attribute She_binarized {0,1}
@attribute at_binarized {0,1}
@attribute everyone_binarized {0,1}
@attribute glad_binarized {0,1}
@attribute hurt_binarized {0,1}
@attribute it_binarized {0,1}
@attribute least_binarized {0,1}
@attribute made_binarized {0,1}
@attribute noone_binarized {0,1}
@attribute out_binarized {0,1}
@attribute shining_binarized {0,1}
@attribute sun_binarized {0,1}
@attribute the_binarized {0,1}
@attribute was_binarized {0,1}
@attribute That_binarized {0,1}
@attribute The_binarized {0,1}
@attribute away_binarized {0,1}
@attribute car_binarized {0,1}
@attribute crashed_binarized {0,1}
@attribute culpirit_binarized {0,1}
@attribute got_binarized {0,1}
@attribute horribly_binarized {0,1}
@attribute really_binarized {0,1}
@attribute unlucky numeric
@data
{0 neg,16 1,17 1,25 1,26 1}
{0 neg,18 1,20 1,21 1,24 1}
{0 neg,18 1,19 1,22 1,23 1}
{2 1,5 1,8 1,10 1,12 1}
{3 1,6 1,7 1,11 1,16 1}
{1 1,4 1,9 1,13 1,14 1,15 1,16 1}
我现在开始测试集,这是工作:
@relation sentiment
@attribute phrase string
@data
'That was really unlucky'
'The car crashed horribly'
'The culpirit got away'
我的希望是,秧鸡可以在这个文本为“负”分类。为了使它们兼容,我使用与我在训练集(StringToWordVector和NumericToBinary)上相同的过滤器。这是测试集的最终结果是:
@relation 'sentiment-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-O-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"-weka.filters.unsupervised.attribute.NumericToBinary'
@attribute That_binarized {0,1}
@attribute The_binarized {0,1}
@attribute away_binarized {0,1}
@attribute car_binarized {0,1}
@attribute crashed_binarized {0,1}
@attribute culpirit_binarized {0,1}
@attribute got_binarized {0,1}
@attribute horribly_binarized {0,1}
@attribute really_binarized {0,1}
@attribute unlucky_binarized {0,1}
@attribute was numeric
@data
{0 1,8 1,9 1,10 1}
{1 1,3 1,4 1,7 1}
{1 1,2 1,5 1,6 1}
但是,它给我的错误,训练集和测试集不兼容,而且我真的不能找出原因。这直观地看起来像weka应该理解的东西。
感谢您的帮助!