在opennlp中训练自己的模型

我发现很难创建我自己的模型openNLP。任何人都可以告诉我，如何拥有模型。训练如何完成。在opennlp中训练自己的模型

什么应该是输入和输出模型文件的存储位置。

来源

2012-06-26 user1482228

对于哪种工具是你创建一个模型？ – wcolen

也许这篇文章会帮助你。它描述了如何做TokenNameFinder从维基百科中提取数据训练......

nuxeo - blog - Mining Wikipedia with Hadoop and Pig for Natural Language Processing

来源

2012-06-27 14:16:12

该链接不再工作。 – Ruthwik

@Ruthwik感谢您的评论。链接已更新。 –

https://opennlp.apache.org/docs/1.5.3/manual/opennlp.html

这个网站是非常有用的，同时显示代码，并使用OpenNLP应用于训练所有不同类型的模型，如实体提取和词类等。

我可以给你索姆e代码示例在这里，但该页面使用非常清晰。

理论明智：

基本上你创建列出你想训练

例如东西的文件。

体育[空格]这是一个关于足球，橄榄球和东西

政治[空格]这是一个关于布莱尔当首相，一页一页。

该格式在上面的页面中进行了描述（每个模型需要不同的格式）。一旦你创建了这个文件，你就可以通过API或opennlp应用程序（通过命令行）运行它，并生成一个.bin文件。一旦你有这个.bin文件，你可以将它加载到模型中，并开始使用它（按照上面的网站中的api）。

来源

2013-10-28 15:57:38

或者可以说RTFM为自己节省一些打字。 – demongolem

让我告诉你最新的文档http://opennlp.apache.org/docs/1.8.1/manual/opennlp.html –

首先您需要使用所需的实体来训练数据。

句子应该用换行符分隔（\ n）。值应该与空格字符分隔。
比方说，你要创建医药实体模型，这样的数据应该是这样的：

<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines - <START:medicine> amoxicillin trihydrate <END> and 
<START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections.

你可以参考的样本dataset例如。训练数据应至少有15000个句子才能获得更好的结果。

此外，您可以使用Opennlp TokenNameFinderTrainer。输出文件将采用.bin格式。

这里是例子：Writing a custom NameFinder model in OpenNLP

欲了解更多详情，请参照Opennlp documentation

来源

2016-06-08 07:27:13

将数据复制数据并运行下面的代码，以获得自己的mymodel.bin。

可以参考的数据= https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt

public class Training { 
     static String onlpModelPath = "mymodel.bin"; 
     // training data set 
     static String trainingDataFilePath = "data.txt"; 

     public static void main(String[] args) throws IOException { 
         Charset charset = Charset.forName("UTF-8"); 
         ObjectStream<String> lineStream = new PlainTextByLineStream(
                 new FileInputStream(trainingDataFilePath), charset); 
         ObjectStream<NameSample> sampleStream = new NameSampleDataStream(
                 lineStream); 
         TokenNameFinderModel model = null; 
         HashMap<String, Object> mp = new HashMap<String, Object>(); 
         try { 
           //   model = NameFinderME.train("en","drugs", sampleStream, Collections.<String,Object>emptyMap(),100,4) ; 
             model= NameFinderME.train("en", "drugs", sampleStream, Collections. emptyMap()); 
         } finally { 
             sampleStream.close(); 
         } 
         BufferedOutputStream modelOut = null; 
         try { 
             modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath)); 
             model.serialize(modelOut); 
         } finally { 
             if (modelOut != null) 
                 modelOut.close(); 
         } 
     } 
}

来源

2016-09-21 13:33:28 user6858643

欢迎来到Stack Overflow！虽然这段代码可能有助于解决这个问题，但它并没有解释_why_和/或_how_它是如何回答这个问题的。提供这种附加背景将显着提高其长期教育价值。请[编辑]您的答案以添加解释，包括适用的限制和假设。 –

在opennlp中训练自己的模型

回答

相关问题