2017-01-17 18 views
0

我试图按照here所述的方式训练我自己的关系提取模型,但不断收到一个奇怪的错误。当tmp/roth_sentences.ser在培训Stanford Relation Extractor模型时发生FileNotFoundException

我的属性文件:

#Below are some basic options. See edu.stanford.nlp.ie.machinereading.MachineReadingProperties class for more options. 

# Pipeline options 
annotators = pos, lemma, parse 
parse.maxlen = 100 

# MachineReading properties. You need one class to read the dataset into correct format. See edu.stanford.nlp.ie.machinereading.domains.ace.AceReader for another example. 
datasetReaderClass = edu.stanford.nlp.ie.machinereading.domains.roth.RothCONLL04Reader 

readerLogLevel = INFO 
#Data directory for training. The datasetReaderClass reads data from this path and makes corresponding sentences and annotations. 
trainPath = ../re-training-data.corp 

#Whether to crossValidate, that is evaluate, or just train. 
crossValidate = false 
kfold = 10 

#Change this to true if you want to use CoreNLP pipeline generated NER tags. The default model generated with the relation extractor release uses the CoreNLP pipeline provided tags (option set to true$ 
trainUsePipelineNER=true 

# where to save training sentences. uses the file if it exists, otherwise creates it. 
serializedTrainingSentencesPath = tmp/roth_sentences.ser 

serializedEntityExtractorPath = tmp/roth_entity_model.ser 

# where to store the output of the extractor (sentence objects with relations generated by the model). This is what you will use as the model when using 'relation' annotator in the CoreNLP pipeline. 
serializedRelationExtractorPath = tmp/kpl-relation-model-pipeline.ser 

# uncomment to load a serialized model instead of retraining 
# loadModel = true 

#relationResultsPrinters = edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter,edu.stanford.nlp.ie.machinereading.domains.roth.RothResultsByRelation. For printing output of the model. 
relationResultsPrinters = edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter 

#In this domain, this is trivial since all the entities are given (or set using CoreNLP NER tagger). 
entityClassifier = edu.stanford.nlp.ie.machinereading.domains.roth.RothEntityExtractor 

extractRelations = true 
extractEvents = false 

#We are setting the entities beforehand so the model does not learn how to extract entities etc. 
extractEntities = false 

#Opposite of crossValidate. 
trainOnly=true 

# The set chosen by feature selection using RothCONLL04: 
relationFeatures = arg_words,arg_type,dependency_path_lowlevel,dependency_path_words,surface_path_POS,entities_between_args,full_tree_path 

下面是我在终端运行:

sudo java -cp stanford-corenlp-3.7.0.jar:stanford-corenlp-3.7.0-models.jar edu.stanford.nlp.ie.machinereading.MachineReading --arguments kpl-re-model.properties 

而且结果:

PERCENTAGE OF TRAIN: 1.0 
The reader log level is set to INFO 
Adding annotator pos 
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.8 sec]. 
Adding annotator lemma 
Adding annotator parse 
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.6 sec]. 
Jan 17, 2017 4:55:06 PM edu.stanford.nlp.ie.machinereading.MachineReading makeResultsPrinters 
INFO: Making result printers from 
Jan 17, 2017 4:55:06 PM edu.stanford.nlp.ie.machinereading.MachineReading makeResultsPrinters 
INFO: Making result printers from edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter 
Jan 17, 2017 4:55:06 PM edu.stanford.nlp.ie.machinereading.MachineReading makeResultsPrinters 
INFO: Making result printers from 
Jan 17, 2017 4:55:06 PM edu.stanford.nlp.ie.machinereading.MachineReading loadOrMakeSerializedSentences 
INFO: Parsing corpus sentences... 
Jan 17, 2017 4:55:06 PM edu.stanford.nlp.ie.machinereading.MachineReading loadOrMakeSerializedSentences 
INFO: These sentences will be serialized to /home/ubuntu/stanford-corenlp-full-2016-10-31/tmp/roth_sentences.ser 
Jan 17, 2017 4:55:06 PM edu.stanford.nlp.ie.machinereading.domains.roth.RothCONLL04Reader read 
INFO: Reading file: ../re-training-data.corp 
Jan 17, 2017 4:55:07 PM edu.stanford.nlp.ie.machinereading.GenericDataSetReader preProcessSentences 
SEVERE: GenericDataSetReader: Started pre-processing the corpus... 
Jan 17, 2017 4:55:07 PM edu.stanford.nlp.ie.machinereading.GenericDataSetReader preProcessSentences 
INFO: Annotating dataset with [email protected] 
Jan 17, 2017 4:58:32 PM edu.stanford.nlp.ie.machinereading.GenericDataSetReader preProcessSentences 
SEVERE: GenericDataSetReader: Pre-processing complete. 
Jan 17, 2017 4:58:32 PM edu.stanford.nlp.ie.machinereading.GenericDataSetReader parse 
SEVERE: Changing NER tags using the CoreNLP pipeline. 
Replacing old annotator "parse" with signature [edu.stanford.nlp.pipeline.ParserAnnotator#parse.maxlen:100;#] with new annotator with signature [edu.stanford.nlp.pipeline.ParserAnnotator##] 
Adding annotator pos 
Adding annotator lemma 
Adding annotator ner 
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.4 sec]. 
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.5 sec]. 
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.5 sec]. 
Jan 17, 2017 4:58:45 PM edu.stanford.nlp.ie.machinereading.MachineReading loadOrMakeSerializedSentences 
INFO: Done. Parsed 1183 sentences. 
Jan 17, 2017 4:58:45 PM edu.stanford.nlp.ie.machinereading.MachineReading loadOrMakeSerializedSentences 
INFO: Serializing parsed sentences to /home/ubuntu/stanford-corenlp-full-2016-10-31/tmp/roth_sentences.ser... 
Exception in thread "main" java.io.FileNotFoundException: tmp/roth_sentences.ser (No such file or directory) 
    at java.io.FileOutputStream.open0(Native Method) 
    at java.io.FileOutputStream.open(FileOutputStream.java:270) 
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213) 
    at edu.stanford.nlp.io.IOUtils.writeObjectToFile(IOUtils.java:77) 
    at edu.stanford.nlp.io.IOUtils.writeObjectToFile(IOUtils.java:63) 
    at edu.stanford.nlp.ie.machinereading.MachineReading.loadOrMakeSerializedSentences(MachineReading.java:914) 
    at edu.stanford.nlp.ie.machinereading.MachineReading.run(MachineReading.java:270) 
    at edu.stanford.nlp.ie.machinereading.MachineReading.main(MachineReading.java:111 

的错误状态,它不能找到'tmp/roth_sentences.ser'但它没有意义,因为它应该创建该文件。

任何想法?

谢谢! Simon。

回答

1

我想如果你改变tmp/roth_sentences.serroth_sentences.ser它应该工作。我猜测问题是/home/ubuntu/stanford-corenlp-full-2016-10-31/tmp不存在,所以当它试图写入文件时崩溃。

+0

它的工作!奇怪的是,在发布这个问题之前,我有同样的怀疑,所以我尝试了_creating_一个tmp目录,但是这没有奏效。无论如何,谢谢! – Simon