斯坦福分析器：frenchFactored.ser.gz

我使用法语的斯坦福分析器（版本3.6.0）。我的命令行是斯坦福分析器：frenchFactored.ser.gz

java -cp stanford-parser.jar:* edu.stanford.nlp.parser.lexparser.LexicalizedParser -maxlength 30 -outputFormat conll2007 frenchFactored.ser.gz test_french.txt > test_french.conll10

但我不明白的输出的功能，请参阅：

1济_ CLS CLS _ 2 NULL _

2奶源_ _ VV 0根_ _

3 DES _ _ PP 2 NULL _ _

4 POMMES _ _ NN 3 NULL _ _

，电话：

5。 _ PUNC PUNC _ 2 NULL _ _

我可能在命令行中错过了什么？

来源

2016-03-04 starckman

斯坦福CoreNLP 3.6.0中有一个深度学习的法语依赖解析器。

下载斯坦福CoreNLP 3.6.0这里：

http://stanfordnlp.github.io/CoreNLP/download.html

而且一定要得到法国车型罐子，这也是可用的页面上。

然后运行这个命令来使用法语依赖解析器，确保在你的CLASSPATH法国车型的jar：

java -Xmx6g -cp "*:stanford-corenlp-full-2015-12-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french.properties -file sample-french-document.txt -outputFormat text

来源

2016-03-06 10:37:48 StanfordNLPHelp

感谢您的回复！ – starckman

我给这个命令：java -mx1g -cp stanford-corenlp-3.7.0.jar：stanford-french-corenlp-2016-10-31-models.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-french .properties -annotators tokenize，ssplit，pos，depparse -file /Users/Rafael/Desktop/LANGAGES/CORPUS/Sentences_FR/3aube_schtrouFR30.txt -outputFormat sortie.txt但我得到这个错误信息无法打开“edu/stanford/nlp/models/pos-tagger/french/french.tagger“作为类路径，文件名或URL – starckman

这些jar文件是否存在于您运行此命令的目录中。你得到这个错误是因为某些原因，法语模型jar不在你的CLASSPATH中。如果你在法语模型jar上做了jar -tf，你会看到标记文件存在。 – StanfordNLPHelp

有什么不对您的命令：

已知的格式有：ONELINE，佩恩，latexTree，xmlTree，也就是说，wordsAndTags，rootSymbolOnly，依赖关系，typedDependencies，typedDependenciesCollapsed，搭配，semanticGraph，conllStyleDependencies，conll2007 。最后两个都是制表符分隔值格式。 后者有更多的列填充下划线。 [...]

来源：http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreePrint.html

你可以尝试其他-outputFormat。

来源

2016-03-04 15:04:09 mejem

感谢，对中国解析器（xinhuaFactored.ser.gz）我得到这样nsubj，auxpass的语法功能等，但与法国一个正如你所看到的，我只能得到“NULL”，这是否意味着函数注释在斯坦福分析器中对于法语不可用？ – starckman

它也适用于英语（我现在尝试过）。它似乎没有为法语实施。所以你的命令很好，但解析器不能像你期望的那样工作。 – mejem

好的，这就是我在这里阅读的https://mailman.stanford.edu/pipermail/parser-user/2014-June/002937.html：“我们还没有（直接）依赖解析器，而是解析为选区然后转换为英文和中文，你需要以类似的方式转换法语依赖关系的解析树，或者使用其他一些组的依赖解析器，这并非不可能，但这将是一大堆工作。但因为它的日期是2014年6月，所以我不确定它是否仍然如此。谢谢！ – starckman

您所查询的是好的，但斯坦福解析器不支持此尚未（版本3.6.0）。

以下代码在使用法语模式时会打印“false”。您正在使用的命令会在内部检查此内容，并在虚假时安静地避免分析。

System.out.println(
    LexicalizedParser 
    .loadModel("frenchFactored.ser.gz") 
    .treebankLanguagePack() 
    .supportsGrammaticalStructures() 
);

这就是为什么我使用麦芽解析器（http://www.maltparser.org/）。

如果你喜欢的以下输出：

1 Je  Je  C CLS  null 2 suj  _ _ 
2 mange mange V V  null 0 root _ _ 
3 des  des  P P  null 2 mod  _ _ 
4 pommes pommes N N  null 3 obj  _ _ 
5 .  .  P PUNC null 2 mod  _ _

然后使用以下代码生成它（不能简单地使用命令行）。我使用这两个斯坦福大学和麦芽来实现：

LexicalizedParser lexParser = LexicalizedParser.loadModel("frenchFactored.ser.gz"); 
TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), ""); 
ConcurrentMaltParserModel parserModel = ConcurrentMaltParserService.initializeParserModel(new File("fremalt-1.7.mco")); 

Tokenizer<CoreLabel> tok = tokenizerFactory.getTokenizer(new StringReader("Je mange des pommes.")); 
List<CoreLabel> rawWords2 = tok.tokenize(); 
Tree parse = lexParser.apply(rawWords2); 

// The malt parser requires token in the MaltTab format (Connll). 
// Instead of using the Stanford tagger, we could have used Melt or another parser. 
String[] tokens = parse.taggedLabeledYield().stream() 
    .map(word -> { 
     CoreLabel w = (CoreLabel)word; 
     String lemma = Morphology.lemmatizeStatic(new WordTag(w.word(), w.tag())).word(); 
     String tag = w.value(); 

     return String.join("\t", new String[]{ 
      String.valueOf(w.index()+1), 
      w.word(), 
      lemma != null ? lemma : w.word(), 
      tag != null ? String.valueOf(tag.charAt(0)) : "_", 
      tag != null ? tag : "_" 
     }); 
    }) 
    .toArray(String[]::new); 

ConcurrentDependencyGraph graph = parserModel.parse(tokens); 
System.out.println(graph);

从那里，你可以通过编程方式使用遍历图形：

graph.nTokenNodes()

如果你使用Maven，只需添加以下依赖你POM：

<dependency> 
    <groupId>org.maltparser</groupId> 
    <artifactId>maltparser</artifactId> 
    <version>1.8.1</version> 
</dependency> 
<dependency> 
    <groupId>edu.stanford.nlp</groupId> 
    <artifactId>stanford-corenlp</artifactId> 
    <version>3.6.0</version> 
</dependency>

奖励：进口

import org.maltparser.concurrent.ConcurrentMaltParserModel; 
import org.maltparser.concurrent.ConcurrentMaltParserService; 
import org.maltparser.concurrent.graph.ConcurrentDependencyGraph; 
import org.maltparser.concurrent.graph.ConcurrentDependencyNode; 
import org.maltparser.core.exception.MaltChainedException; 

import edu.stanford.nlp.ling.CoreLabel; 
import edu.stanford.nlp.ling.WordTag; 
import edu.stanford.nlp.parser.lexparser.LexicalizedParser; 
import edu.stanford.nlp.process.CoreLabelTokenFactory; 
import edu.stanford.nlp.process.Morphology; 
import edu.stanford.nlp.process.PTBTokenizer; 
import edu.stanford.nlp.process.Tokenizer; 
import edu.stanford.nlp.process.TokenizerFactory; 
import edu.stanford.nlp.trees.Tree;

超：fremalt-1.7.mco文件

http://www.maltparser.org/mco/french_parser/fremalt.html

来源

2016-03-10 04:38:37 antoine

对不起，我没有连接很长时间，没有回应，非常感谢。我使用法语的Mate Parser，我推荐https://code.google.com/archive/p/mate-tools/downloads – starckman

斯坦福分析器：frenchFactored.ser.gz

回答

相关问题