基于规则的空间实体匹配器

我想使用python库spacy来匹配文本中的记号（将标签添加为语义引用）。然后，我想用这些匹配来提取令牌之间的关系。我的第一个是使用空间的matcher.add和matcher.add_pattern。该matcher.add工作正常，我能找到的标记，我的代码至今：基于规则的空间实体匹配器

import spacy 


nlp = spacy.load('en') 

def merge_phrases(matcher, doc, i, matches): 
    if i != len(matches)-1: 
     return None 
    spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches] 
    for ent_id, label, span in spans: 
     span.merge('NNP' if label else span.root.tag_, span.text, nlp.vocab.strings[label]) 



matcher = spacy.matcher.Matcher(nlp.vocab) 



matcher.add(entity_key='1', label='FINANCE', attrs={}, specs=[[{spacy.attrs.ORTH: 'financial'}, {spacy.attrs.ORTH: 'instrument'}]], on_match=merge_phrases) 
matcher.add(entity_key='2', label='BUYER', attrs={}, specs=[[{spacy.attrs.ORTH: 'acquirer'}]], on_match=merge_phrases) 
matcher.add(entity_key='3', label='CODE', attrs={}, specs=[[{spacy.attrs.ORTH: 'Code'}]], on_match=merge_phrases)

这工作得很好，它输出相当不错的结果：

doc = nlp(u'Code used to identify the acquirer of the financial instrument.') 

# Output 
['Code|CODE', 'used|', 'to|', 'identify|', 'the|', 'acquirer|BUYER', 'of|', 'the|', 'financial instrument|FINANCE', '.|']

我的问题是，我如何使用matcher.add_patern匹配标记之间的关系，有点像

matcher.add_pattern("IS_OF", [{BUYER}, {'of'}, {FINANCE}])

的输出：

doc = nlp(u'Code used to identify the acquirer of the financial instrument.') 

# Output 
[acquirer of financial instrument]

我尝试过不同的方式来使这个作品，但显然不是，我想我的理解matcher.add_pattern有什么问题。

有些请让我在正确的方向如何做到这一点 spacy？
是否有可能在这里添加正则表达式来查找模式，怎么样？
如何添加具有相同标签的多个标记，或者以某种方式为相同标签创建标记列表，例如。 “金融”？

我会很感激任何意见。

来源

2017-04-13 El_Patrón

您的匹配器会识别令牌，但要找到它们之间的关系，您必须执行依赖关系解析。这里是visual example from spacy：

然后，您可以遍历树找到标记之间的关系。每个令牌的 https://spacy.io/docs/usage/dependency-parse#navigating

的DEP（ENUM）和dep_（详细名称）属性会给你的关系，与其子

来源

2017-04-14 19:35:08 DhruvPathak

谢谢您的回答，它有很大帮助。我想知道是否能够更方便地训练指定的entitiy模型，以便在我的源代码中找到新的相关实体，然后找到实体之间的关系。有一些关于这个使用NLTK的文档，但你如何用spacy来处理这个问题，我的意思是关系提取部分？ –

您能否提供一个依赖解析的例子，这与spacy-matcher兼容，还是我在这里得到错误的想法？ –

@El_Patrón答案中提供的链接有示例，是的，它将与spacy-mathcher兼容，因为依赖关系解析结果是spacy令牌本身作为dep和dep_存在的属性 – DhruvPathak

基于规则的空间实体匹配器

回答

相关问题