2017-05-25 45 views
1

我将添加新实体到自己的spacy数据模型“mymodel”。在使用此tutorial安装“mymodel”之前,它运行良好。当我想用“mymodel”来添加新的实体时,我有一个误解。请帮我Python Spacy错误:RuntimeError:语言不支持

这里是我的代码:

import plac 

from spacy.en import English 
from spacy.gold import GoldParse 
import spacy 
nlp = spacy.load('mymodel') 

def main(out_loc): 
    nlp = English(parser=False) # Avoid loading the parser, for quick load times 
    # Run the tokenizer and tagger (but not the entity recognizer) 
    doc = nlp.tokenizer(u'Lions and tigers and grizzly bears!') 
    nlp.tagger(doc) 

    nlp.entity.add_label('ANIMAL') # <-- New in v0.100 

    # Create a GoldParse object. This should have a better API... 
    indices = tuple(range(len(doc))) 
    words = [w.text for w in doc] 
    tags = [w.tag_ for w in doc] 
    heads = [0 for _ in doc] 
    deps = ['' for _ in doc] 
    # This is the only part we care about. We want BILOU format 
    ner = ['U-ANIMAL', 'O', 'U-ANIMAL', 'O', 'B-ANIMAL', 'L-ANIMAL', 'O'] 

    # Create the GoldParse 
    annot = GoldParse(doc, (indices, words, tags, heads, deps, ner)) 

    # Update the weights with the example 
    # Here we iterate until we get it entirely correct. In practice this is probably a bad idea. 
    # Note that we've added a class to the existing model here! We "resume" 
    # training the previous model. Whether this is good or not I can't say, you'll have to 
    # experiment. 
    loss = nlp.entity.train(doc, annot) 
    i = 0 
    while loss != 0 and i < 1000: 
     loss = nlp.entity.train(doc, annot) 
     i += 1 
    print("Used %d iterations" % i) 

    nlp.entity(doc) 
    for ent in doc.ents: 
     print(ent.text, ent.label_) 
    nlp.entity.model.dump(out_loc) 

if __name__ == '__main__': 
    plac.call(main) 

**Error of output:** 

Traceback (most recent call last): 
    File "/home/vv/webapp/dic_model.py", line 7, in <module> 
    nlp = spacy.load('mymodel') 
    File "/usr/local/lib/python3.5/dist-packages/spacy/__init__.py", line 26, in load 
    lang_name = util.get_lang_class(name).lang 
    File "/usr/local/lib/python3.5/dist-packages/spacy/util.py", line 27, in get_lang_class 
    raise RuntimeError('Language not supported: %s' % name) 
RuntimeError: Language not supported: mymodel 

回答

2

这里的问题是,spacy.load()目前预计无论是语言ID(如'en'),或shortcut link来告诉spaCy在哪里可以找到一个模型数据。由于spaCy找不到快捷方式链接,因此假定'my_model'是一种语言,显然不存在。

您可以设置一个链接,你的模型是这样的:

python -m spacy link my_model my_model # if it's installed via pip, or: 
python -m spacy link /path/to/my_model/data my_model 

这将创建在/spacy/data目录符号链接,所以你应该用管理员权限运行它。

或者,如果你已经创建了一个可以通过PIP安装一个model package,你可以简单地安装和导入,然后调用它的load()方法不带任何参数:

import my_model 
nlp = my_model.load() 

在某些情况下,这加载模型的方式实际上更方便,因为它更干净,并且可以让您更轻松地调试代码。例如,如果模型不存在,Python将立即引发ImportError。同样,如果加载失败,您知道模型自己的加载和元可能存在问题。


顺便说一句:我是spaCy维护者之一,我认为,目前spacy.load()的工作方式肯定是不理想和混乱。我们期待在下一个主要版本中终于做出改变。我们非常接近发布v2.0的第一个alpha版本,它将更加优雅地解决这个问题,并且还包括对培训过程和文档的大量改进。

+0

我面临同样的问题,并很乐意收到一些澄清。我遵循这个脚本https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py和'save_model'方法,我创建了一个文件夹'spaCy_NER',其中包含'config.json'文件和'model',以及一个名为'vocab'的子文件夹。我尝试提供'path/to/spaCy_NER/model'和'path/to/spaCy_NER /'作为'python -m spacy link'的第一个参数。但在这两种情况下,我都会收到相同的RuntimeError。你有什么建议吗? –