Tokensregexner应该使用哪些设置

当我尝试使用regexner时，它可以按照预期的方式使用以下设置和数据;Tokensregexner应该使用哪些设置

props.setProperty("annotators", "tokenize, cleanxml, ssplit, pos, lemma, regexner");

法学学士学位
本科（艺术|法律|科技|工程|神）度

我想要做的就是使用TokenRegex。例如

法学学士学位
学士（[{标签：NNS}] [{标签：NNP}]）学位

我读到这样做，我应该使用TokensregexNERAnnotator 。

我试图按如下方式使用它，但它不起作用。

Pipeline.addAnnotator(new TokensRegexNERAnnotator("expressions.txt", true));

或者我试图以另一种方式设置注释，

props.setProperty("annotators", "tokenize, cleanxml, ssplit, pos, lemma, tokenregexner");  
props.setProperty("customAnnotatorClass.tokenregexner", "edu.stanford.nlp.pipeline.TokensRegexNERAnnotator");

我尝试不同的TokenRegex格式，但无论是标注找不到表达或我SyntaxException。

在NER数据文件上使用TokenRegex（带有标记的标记查询）的正确方法是什么？

顺便说一声我只看到TokensRegexNERAnnotator.java文件中的评论。不确定它是否相关pos标签不适用于RegexNerAnnotator。

if (entry.tokensRegex != null) { 
    // TODO: posTagPatterns... 
    pattern = TokenSequencePattern.compile(env, entry.tokensRegex); 
    }

来源

2017-02-26 mehmetilker

首先你需要制作一个TokensRegex规则文件（sample_degree.rules）。这里有一个例子：

ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" } 

{ pattern: (/Bachelor/ /of/ [{tag:NNP}]), action: Annotate($0, ner, "DEGREE") }

为了解释规则了一下，pattern字段指定匹配什么样的模式。 action字段表示注释整个匹配中的每个令牌（即$0表示的值），注释ner字段（注意，我们在规则文件中指定了ner = ...，第三个参数是说设置字段到字符串“DEGREE”）。

然后使这个.props的命令文件（degree_example.props）：

customAnnotatorClass.tokensregex = edu.stanford.nlp.pipeline.TokensRegexAnnotator 

tokensregex.rules = sample_degree.rules 

annotators = tokenize,ssplit,pos,lemma,ner,tokensregex

然后运行这个命令：

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -props degree_example.props -file sample-degree-sentence.txt -outputFormat text

你应该会看到，你想标记为“三个令牌DEGREE“将被标记。

我想我会推动对代码的更改以使tokensregex链接到TokensRegexAnnotator，因此您不必将其指定为自定义注释器。但现在您需要在.props文件中添加该行。

这个例子应该有助于实现这个。这里有一些更多的资源，如果你想了解更多：

http://nlp.stanford.edu/software/tokensregex.shtml#TokensRegexRules

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/tokensregex/SequenceMatchRules.html

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/tokensregex/types/Expressions.html

来源

2017-03-05 03:35:26 StanfordNLPHelp

你周边的7000规则指点同样的方式或者是有意义的，只是对一些规则？ – mehmetilker

regexner仅用于指定我猜想的替代词（（Arts | Laws | Science | Engineering | Divinity））。它会更容易。（我再次阅读你的答案，如果我对你打算做的改变没有任何错误，我们可以在我猜测的正则表达式文本文件中指定标记查询。）感谢你的帮助。 – mehmetilker

非常感谢@StanfordNLPHelp。很好的解释！ –

Tokensregexner应该使用哪些设置

回答

相关问题