2014-05-18 39 views
1

我试图在Java中使用opennlp事情是我得到了以下格式的训练文本文件,培训EN-NER-location.bin文件 <START:location> Fontana <END> <START:location> Palo Verde <END> <START:location> Picacho <END>无法训练location.bin使用opennlp用java

和我用下面的代码

import java.io.BufferedOutputStream; 
    import java.io.BufferedReader; 
    import java.io.File; 
    import java.io.FileInputStream; 
    import java.io.FileOutputStream; 
    import java.io.FileReader; 
    import java.io.IOException; 
    import java.io.InputStream; 
    import java.nio.charset.Charset; 
    import java.util.Collections; 
    import opennlp.tools.namefind.NameFinderME; 
    import opennlp.tools.namefind.NameSample; 
    import opennlp.tools.namefind.NameSampleDataStream; 
    import opennlp.tools.namefind.TokenNameFinderModel; 
    import opennlp.tools.tokenize.Tokenizer; 
    import opennlp.tools.tokenize.TokenizerME; 
    import opennlp.tools.tokenize.TokenizerModel; 
    import opennlp.tools.util.ObjectStream; 
    import opennlp.tools.util.PlainTextByLineStream; 
    import opennlp.tools.util.Span; 

    public class TrainNames { 
@SuppressWarnings("deprecation") 
public void TrainNames() throws IOException{ 
    File fileTrainer=new File("citytrain.txt"); 
    File output=new File("en-ner-location.bin"); 
    ObjectStream<String> lineStream = new PlainTextByLineStream(new FileInputStream(fileTrainer), "UTF-8"); 
    ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream); 
    System.out.println("lineStream = " + lineStream); 
    TokenNameFinderModel model = NameFinderME.train("en", "location", sampleStream, Collections.<String, Object>emptyMap(), 1, 0); 

    BufferedOutputStream modelOut = null; 
    try { 
     modelOut = new BufferedOutputStream(new FileOutputStream(output)); 
     model.serialize(modelOut); 
    } finally { 
     if (modelOut != null) 
      modelOut.close(); 
    } 
} 
    } 

我没有错误或警告,但是当我试图从这样的CNT =字符串得到一个城市的名字“约翰正计划专注于电气工程在UC丰塔纳和训练有素的文件在IBM寻求职业生涯。“;它返回整个字符串 任何人都可以告诉我为什么......?

回答

0

欢迎来到SO!看起来您需要每个位置注释的更多上下文。我相信openNLP认为你正在训练它找到单词(任何单词),因为你的训练数据只有一个单词。您需要在整个句子中注释位置,并且您至少需要几百个样本才能看到好的结果。

看到这个答案还有: How I train an Named Entity Recognizer identifier in OpenNLP?

+0

您好我改变了训练文件如你所说,我包括100句containng的城市名称和标记他们,但也没有奏效....ü想想哪里我做错了 – user3649086

+0

尝试改变你的电话到.train上面这个: TokenNameFinderModel model = NameFinderME.train(“en”,“location”,sampleStream,null); – markg