NLTK从原始文本中获取依赖关系

我需要从使用NLTK的原始文本中获取句子依赖关系。据我所知，斯坦福分析器允许我们创建树，但是如何从这棵树的句子中获得依赖关系，我没有发现（也许这可能，也许不是）所以我开始使用MaltParser。下面是我使用的是和平代码：NLTK从原始文本中获取依赖关系

import os 
from nltk.parse.stanford import StanfordParser 
from nltk.tokenize import sent_tokenize 
from nltk.parse.dependencygraph import DependencyGraph 
from nltk.parse.malt import MaltParser 


os.environ['JAVAHOME'] = r"C:\Program Files (x86)\Java\jre1.8.0_45\bin\java.exe" 
os.environ['MALT_PARSER'] = r"C:\maltparser-1.8.1" 

maltParser = MaltParser(r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco") 

class Parser(object): 
    @staticmethod 
    def Parse (text): 
     rawSentences = sent_tokenize(text) 
     treeSentencesStanford = stanfordParser.raw_parse_sents(rawSentences) 

     a=maltParser.raw_parse(rawSentences[0])

但最后一行抛出异常“‘海峡’对象有没有属性‘标签’”

changing the code above like this: 
rawSentences = sent_tokenize(text) 
     treeSentencesStanford = stanfordParser.raw_parse_sents(rawSentences) 

     splitedSentences = [] 
     for sentence in rawSentences: 
      splitedSentence = word_tokenize(sentence) 
      splitedSentences.append(splitedSentence) 


     a=maltParser.parse_sents(splitedSentences)

抛出同样的异常。

所以，我做错了什么。而在一般情况：我要我在正确的方式来获得依赖这样的：http://www.nltk.org/images/depgraph0.png（但我需要从代码中访问这些依赖关系）

Traceback (most recent call last): 
    File "E:\Google drive\Python multi tries\Python multi tries\Parser.py", line 51, in <module> 
    Parser.Parse("Some random sentence. Hopefully it will be parsed.") 
    File "E:\Google drive\Python multi tries\Python multi tries\Parser.py", line 32, in Parse 
    a=maltParser.parse_sents(splitedSentences) 
    File "C:\Python27\lib\site-packages\nltk-3.0.1-py2.7.egg\nltk\parse\malt.py", line 113, in parse_sents 
    tagged_sentences = [self.tagger.tag(sentence) for sentence in sentences] 
AttributeError: 'str' object has no attribute 'tag'

来源

2015-05-26 MisterMe

你可以粘贴抛出异常的轨迹吗？ – lenz

当然。添加后。 – MisterMe

您正在使用不适合的参数实例MaltParser。

运行help(MaltParser)提供了以下信息：

Help on class MaltParser in module nltk.parse.malt: 

class MaltParser(nltk.parse.api.ParserI) 
| Method resolution order: 
|  MaltParser 
|  nltk.parse.api.ParserI 
|  __builtin__.object 
| 
| Methods defined here: 
| 
| __init__(self, tagger=None, mco=None, working_dir=None, additional_java_args=None) 
|  An interface for parsing with the Malt Parser. 
|  
|  :param mco: The name of the pre-trained model. If provided, training 
|   will not be required, and MaltParser will use the model file in 
|   ${working_dir}/${mco}.mco. 
|  :type mco: str 
...

所以，当你调用maltParser = MaltParser(r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco")然后在关键字参数tagger设置为路径预训练模式。不幸的是，这个论点没有记录，但显然它是一个PoS标记器，从检查源代码可以看出。

（您不必指定POS机恶搞;有英语基于正则表达式的默认恶搞在该类硬编码）

所以你的代码更改为maltParser = MaltParser(mco=r"C:\maltparser-1.8.1\engmalt.poly-1.7.mco")，你应该罚款（至少直到你找到下一个错误）。

您的其他问题：我认为你在正确的轨道上。如果您对依赖关系感兴趣，最好实际使用依赖关系解析，就像您现在正在做的那样。确实有可能将组成分析转换为依赖（这已被证明），但它可能更多的工作。

来源

2015-05-26 20:34:52 lenz

非常感谢您的回答。现在我正在解决如何从树中提取这些依赖关系的问题（进一步我将使用统计信息和（可能）使用这种statictics来实现自学系统），但NLTK书中的示例对特定示例是“硬编码”的。众所周知，我们都在使用原始的人类文字。如果你有任何来源如何提取这些依赖或如何处理原始文本 - 这将大大帮助我的工作和进一步研究。无论如何，非常感谢您的反馈。 – MisterMe

我不太明白 - 'MaltParser.parse（）'返回一个'DependencyGraph'，其中每个节点都是带有“head”和“deps”条目的字典。基本上这就是依赖关系。你到底在找什么？也许你应该发表一个单独的问题，举一个例子说明你在与什么苦苦挣扎。 – lenz

NLTK从原始文本中获取依赖关系

回答

相关问题