我有很多文本消息,我在它们的代码行下面运行。Lucene项目致命错误
//记号化长期
TokenStream tokenStream = new ClassicTokenizer(LUCENE_VERSION, new StringReader(term));
// stemmize
tokenStream = new PorterStemFilter(tokenStream);
有时我得到下面的错误,有时没有:
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000025f8360, pid=1688, tid=7492
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# J org.apache.lucene.analysis.PorterStemmer.stem(I)Z
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
我应该怎么办?
您是否尝试过使用其中一个分析器,如EnglishAnalyzer - http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/en/EnglishAnalyzer.html,它会干扰并标记化它适合你吗? – nbz
我在上面的代码之前有这样的一行:tokenStream = new StopFilter(LUCENE_VERSION,tokenStream,EnglishAnalyzer.getDefaultStopSet()); 但是当我打印这些条款时,他们不会被干扰!所以我使用上面的代码来进行stemmizing。 – user3582044