2017-04-05 41 views
0

基本上我想要的是用实际的实体替换文本中的所有代词。在.NET中使用Stanford解析器解析Coreference

 // Path to the folder with models extracted from `stanford-corenlp-3.7.0-models.jar` 
     var jarRoot = ... 

     // Text for processing 
     var text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply."; 

     // Annotation pipeline configuration 
     var props = new Properties(); 
     props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); 
     props.setProperty("ner.useSUTime", "0"); 

     // We should change current directory, so StanfordCoreNLP could find all the model files automatically 
     var curDir = Environment.CurrentDirectory; 
     Directory.SetCurrentDirectory(jarRoot); 
     var pipeline = new StanfordCoreNLP(props); 
     Directory.SetCurrentDirectory(curDir); 

     // Annotation 
     var annotation = new Annotation(text); 
     pipeline.annotate(annotation); 

     var graph = annotation.get(new CorefChainAnnotation().getClass()); 
     Console.WriteLine(graph); 

到目前为止,我只能找到“漂亮的打印”,但我想如何进一步处理从“图”的结果,但我不知道如何真正分析从“注释的结果。 get(new CorefChainAnnotation()。getClass())“。在Java中,据说它会返回一个Map < Integer,CorefChain>,但我不知道它应该如何在C#中工作。

你有什么想法吗?

回答

0

一旦你有了注释,你就可以通过转换得到图表。

Map graph = (Map)document.get(new CorefCoreAnnotations.CorefChainAnnotation().getClass()); 
var entrySetValues = graph.entrySet(); 
Iterator it = entrySetValues.iterator(); 
while (it.hasNext()) 
{ 
    Map.Entry kvpair = (Map.Entry)it.next(); 
    CorefChain corefChain = (CorefChain)kvpair.getValue(); 
    var mentionsList = corefChain.getMentionsInTextualOrder() as ArrayList; 
    foreach (CorefMention mention in mentionsList) 
    { 
      string noun = mention.mentionSpan; 
      // do other stuff 
    } 

    it.remove(); 
} 

对于C#,想法是先投正确的对象,获得自投对象列表作为一个ArrayList,环路上的ArrayList和再次投下正确的对象。