2016-07-02 34 views
1

我试过使用核心库和它周围的简单包装,并且都无法找到相同琐碎句子的三元组。斯坦福大学NLP OpenIE未能识别一些句子的三元组

简单的包装代码:

for (final Quadruple<String, String, String, Double> tripple : sentence.openie()) { 
     System.out.println(tripple); 
    } 

和核心库的代码是their own example usage

package edu.stanford.nlp.naturalli; 

import edu.stanford.nlp.ie.util.RelationTriple; 
import edu.stanford.nlp.io.IOUtils; 
import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.semgraph.SemanticGraph; 
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations; 
import edu.stanford.nlp.util.CoreMap; 
import edu.stanford.nlp.util.PropertiesUtils; 

import java.util.Collection; 
import java.util.List; 
import java.util.Properties; 

/** 
* A demo illustrating how to call the OpenIE system programmatically. 
*/ 
public class OpenIEDemo { 

    private OpenIEDemo() {} // static main 

    public static void main(String[] args) throws Exception { 
    // Create the Stanford CoreNLP pipeline 
    Properties props = PropertiesUtils.asProperties(
      "annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie" 
      // , "depparse.model", "edu/stanford/nlp/models/parser/nndep/english_SD.gz" 
      // "annotators", "tokenize,ssplit,pos,lemma,parse,natlog,openie" 
      // , "parse.originalDependencies", "true" 
    ); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

    // Annotate an example document. 
    String text; 
    if (args.length > 0) { 
     text = IOUtils.slurpFile(args[0]); 
    } else { 
     text = "Obama was born in Hawaii. He is our president."; 
    } 
    Annotation doc = new Annotation(text); 
    pipeline.annotate(doc); 

    // Loop over sentences in the document 
    int sentNo = 0; 
    for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) { 
     System.out.println("Sentence #" + ++sentNo + ": " + sentence.get(CoreAnnotations.TextAnnotation.class)); 

     // Print SemanticGraph 
     System.out.println(sentence.get(SemanticGraphCoreAnnotations.EnhancedDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST)); 

     // Get the OpenIE triples for the sentence 
     Collection<RelationTriple> triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class); 

     // Print the triples 
     for (RelationTriple triple : triples) { 
     System.out.println(triple.confidence + "\t" + 
       triple.subjectLemmaGloss() + "\t" + 
       triple.relationLemmaGloss() + "\t" + 
       triple.objectLemmaGloss()); 
     } 

     // Alternately, to only run e.g., the clause splitter: 
     List<SentenceFragment> clauses = new OpenIE(props).clausesInSentence(sentence); 
     for (SentenceFragment clause : clauses) { 
     System.out.println(clause.parseTree.toString(SemanticGraph.OutputFormat.LIST)); 
     } 
     System.out.println(); 
    } 
    } 

} 

两种方法找到三元及格和不及格相同的测试:

The cat jumped over the fence.: (cat,jumped over,fence,1.0) 
The brown dog barked.: FAIL 
The apple was eaten by John.: (apple,was eaten by,John,1.0) 
Joe ate the ripe apple.: (Joe,ate,ripe apple,1.0) 
They named their daughter Natasha.: (They,named,their daughter Natasha,1.0) 
Bob sold me her boat.: FAIL 
Grandfather left Rosalita and Raoul all his money.: FAIL 
Who killed the cat?: FAIL 
How many astronauts have walked on the moon?: (astronauts,have walked on,moon,1.0) 

当它返回的是空的集合失败。

有没有人有类似的问题和解决方法或任何替代解决方案?

+0

你会期望什么样的三倍体_棕色的狗叫?_,它没有直接的物体。而且我也不确定你会期望什么样的三倍杀死一只猫? _Bob卖给了我她的船,虽然它也返回了不正确的三重'(鲍勃,卖给我)'。 –

+0

@ SebastianSchuster对_棕色的狗叫了一声,我希望它能够找回它发现的东西;这个主题是狗和关系被吠叫,但我认为这是有道理的,如果它找不到所有三个,它将不会返回。我认为_谁杀了猫_会返回(谁杀了猫),因为谁是主体。我不确定你为什么能够让_Bob把我的船卖给我。你正在使用哪些模型? – 64test1234

回答

1

OpenIE只返回三元组,因此如果您还想提取没有对象或其他语言补语的句子,则必须添加自己的规则。问题也是如此; OpenIE旨在用于从维基百科等文本中提取大规模关系,在这种情况下,考虑问题是没有意义的。

关于鲍勃例子,似乎这只是在高达上GitHub版本的工作,所以你要么需要重复这个和自己编译或等待下一个版本让这句话上班。