斯坦福大学NLP流水线 - 顺序处理（Java）

如何正确使用Stanford NLP流水线用于两阶段注释？斯坦福大学NLP流水线 - 顺序处理（Java）

在第一阶段我只需要符号化和句子拆分，所以我用这个代码：

private Annotation annotatedDocument = null; 
private StanfordCoreNLP pipeline = null; 

... 

public void firstPhase() { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize, ssplit"); 

     pipeline = new StanfordCoreNLP(props); 
     annotatedDocument = new Annotation(textDocument); 
}

的第二阶段是可选的，所以我不不要在第一阶段使用所有注释器。第二阶段代码：

public void secondPhase() { 
    POSTaggerAnnotator posTaggerAnot = new POSTaggerAnnotator(); 
    posAnot.annotate(annotatedDocument); 

    // Lemmatization 
    MorphaAnnotator morphaAnot = new MorphaAnnotator(); 
    morphaAnot.annotate(annotatedDocument); 
}

第一个问题：使用此方法在第二阶段“独立”的注释是否正确？或者有没有办法使用现有的管道？

第二个问题：我有Correference annotator的问题。我想在第二阶段使用它如下：

CorefAnnotator coref = new CorefAnnotator(new Properties());

但是这个构造函数似乎永远不会结束。没有属性的构造函数不存在，对吧？是否需要一些属性设置？

来源

2016-10-22 David

有（至少）3种方式，你可以这样做：

您所描述的方式。只需调用各个注释器并将它们链接在一起是完全有效的。 coref注释器应该使用空属性 - 也许你需要更多的内存？加载速度有点慢，模型也不小。
如果要继续使用管道，可以创建部分管道并设置属性enforceRequirements=false。这将为您注释注释器，但不需要满足其要求 - 即，如果您知道某些注释已存在，则不必重新运行其相应的注释器。
这是一个更大的变化，但simple api实际上会自动进行这种懒惰评估。所以，你可以创建一个Document对象，当你要求的各种注解，它会慵懒地故障他们。

来源

2016-10-23 01:11:05

你说得对，与** COREF标注问题**是java.lang中。 OutOfMemmoryError异常。 – David

斯坦福大学NLP流水线 - 顺序处理（Java）

回答

相关问题