2016-04-26 43 views
0

我想用批量学习PR在GATE中进行文本分类。我首先写这个配置XML,它可以工作。GATE机器学习不起作用

<?xml version="1.0"?> 
 
<ML-CONFIG> 
 
    <VERBOSITY level="1"/> 
 
    <SURROUND value="false"/> 
 
    <PARAMETER name="thresholdProbabilityClassification" 
 
\t  value="0.5"/> 
 
    <multiClassification2Binary method="one-vs-others"/> 
 
    <EVALUATION method="kfold" 
 
\t  runs="5" 
 
\t  ratio="0.66" /> 
 
    <ENGINE nickname="PAUM" 
 
\t implementationName="PAUM" 
 
\t options=" -p 50 -n 5 -optB 0.0 "/> 
 
    <DATASET> 
 
    <INSTANCE-TYPE>emotion</INSTANCE-TYPE> 
 
    
 
    <NGRAM> 
 
     <NAME>ngram</NAME> 
 
     <NUMBER>1</NUMBER> 
 
     <CONSNUM>4</CONSNUM> 
 
     
 
     <CONS-1> 
 
     <TYPE>Token</TYPE> 
 
     <FEATURE>string</FEATURE> 
 
     </CONS-1> 
 
\t 
 
\t <CONS-2> 
 
     <TYPE>word_bag</TYPE> 
 
     <FEATURE>feature</FEATURE> 
 
     </CONS-2> 
 
\t 
 
\t <CONS-3> 
 
     <TYPE>hashtag</TYPE> 
 
     <FEATURE>feature</FEATURE> 
 
     </CONS-3> 
 
\t 
 
\t <CONS-4> 
 
     <TYPE>Token</TYPE> 
 
     <FEATURE>category</FEATURE> 
 
     </CONS-4> 
 
    <WEIGHT>2</WEIGHT> 
 
    </NGRAM> 
 
    
 
    <ATTRIBUTE> 
 
     <NAME>Class</NAME> 
 
     <SEMTYPE>NOMINAL</SEMTYPE> 
 
     <TYPE>emotion</TYPE> 
 
     <FEATURE>feature</FEATURE> 
 
     <POSITION>0</POSITION> 
 
     <CLASS/> 
 
    </ATTRIBUTE> 
 
    
 
    </DATASET> 
 
</ML-CONFIG>

但是,当我改变缺点的顺序,像下面这样,它不工作。

<?xml version="1.0"?> 
 
<ML-CONFIG> 
 
    <VERBOSITY level="1"/> 
 
    <SURROUND value="false"/> 
 
    <PARAMETER name="thresholdProbabilityClassification" 
 
\t  value="0.5"/> 
 
    <multiClassification2Binary method="one-vs-others"/> 
 
    <EVALUATION method="kfold" 
 
\t  runs="5" 
 
\t  ratio="0.66" /> 
 
    <ENGINE nickname="PAUM" 
 
\t implementationName="PAUM" 
 
\t options=" -p 50 -n 5 -optB 0.0 "/> 
 
    <DATASET> 
 
    <INSTANCE-TYPE>emotion</INSTANCE-TYPE> 
 
    
 
    <NGRAM> 
 
     <NAME>ngram</NAME> 
 
     <NUMBER>1</NUMBER> 
 
     <CONSNUM>4</CONSNUM> 
 
     \t 
 
\t <CONS-1> 
 
     <TYPE>word_bag</TYPE> 
 
     <FEATURE>feature</FEATURE> 
 
     </CONS-1> 
 
\t 
 
\t <CONS-2> 
 
     <TYPE>hashtag</TYPE> 
 
     <FEATURE>feature</FEATURE> 
 
     </CONS-2> 
 
\t 
 
\t <CONS-3> 
 
     <TYPE>Token</TYPE> 
 
     <FEATURE>category</FEATURE> 
 
     </CONS-3> 
 
\t 
 
\t <CONS-4> 
 
     <TYPE>Token</TYPE> 
 
     <FEATURE>string</FEATURE> 
 
     </CONS-4> 
 

 

 
\t 
 
    <WEIGHT>2</WEIGHT> 
 
    </NGRAM> 
 
    
 
    <ATTRIBUTE> 
 
     <NAME>Class</NAME> 
 
     <SEMTYPE>NOMINAL</SEMTYPE> 
 
     <TYPE>emotion</TYPE> 
 
     <FEATURE>feature</FEATURE> 
 
     <POSITION>0</POSITION> 
 
     <CLASS/> 
 
    </ATTRIBUTE> 
 
    
 
    </DATASET> 
 
</ML-CONFIG>

然而,最后一个可以加载到GATE我每次运行批处理学习公关的时候,那还有以下错误信息:

的java.lang。 NullPointerException at gate.learning.NLPFeaturesOfDoc.writeNLPFeaturesToFile(NLPFeaturesOfDoc.java:818) at gate.learning.LightWeightLearningApi.annotations2NLPFeatures(LightWeightLearningApi.java:198) 在gate.learning.EvaluationBasedOnDocs.oneRun(EvaluationBasedOnDocs.java:388) 在gate.learning.EvaluationBasedOnDocs.kfoldEval(EvaluationBasedOnDocs.java:197) 在gate.learning.EvaluationBasedOnDocs.evaluation(EvaluationBasedOnDocs.java:118) 在gate.learning.LearningAPIMain.execute(LearningAPIMain.java:776) at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291) at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:163) at gate。 creole.SerialController.executeImpl(SerialController.java:157) at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225) at gate.creole.ConditionalSerialAnalyserController.execute(Conditio nalSerialAnalyserController.java:132) at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291) at gate.gui.SerialControllerEditor $ RunAction $ 1.run(SerialControllerEditor.java:1728) at java.lang.Thread.run (未知来源)

有没有人有任何想法解决这个问题?

非常感谢!

回答

0

我建议您确保文档导致此问题真正产生在您配置XML文件中定义的功能。因为我看到你使用了令牌,我认为这个文件是空的。