2014-04-02 45 views
1

如何使lucene索引字段不区分大小写。 我的意思是有什么办法来小写索引字段在查询中,而不是值。如何使lucene索引字段不区分大小写

我不能将整个查询转换为小写,因为它会影响其他使用空白分析器的查询。

Query.extractterms() - >方法返回我而言的阵列,但如果输入包含通配符,例如,*

我需要这个,因为我已经小写指数fields.eg

这是行不通的如果我的字段是“演员”索引,我应该能够得到包含“Actor:abc”以及“ACTOR:abc”的查询结果

任何想法?

回答

0

解决方案是创建您自己的分析仪并添加LowerCaseFilter指令。

这里是一个定制的法国分析器的一个例子,其是不区分大小写:

import org.apache.lucene.analysis.Analyzer; 
import org.apache.lucene.analysis.TokenStream; 
import org.apache.lucene.analysis.Tokenizer; 
import org.apache.lucene.analysis.core.LowerCaseFilter; 
import org.apache.lucene.analysis.core.StopFilter; 
import org.apache.lucene.analysis.fr.FrenchAnalyzer; 
import org.apache.lucene.analysis.fr.FrenchLightStemFilter; 
import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter; 
import org.apache.lucene.analysis.standard.StandardFilter; 
import org.apache.lucene.analysis.standard.StandardTokenizer; 
import org.apache.lucene.analysis.util.ElisionFilter; 
import org.apache.lucene.util.Version; 

import java.io.Reader; 

/** 
* Completes {@link org.apache.lucene.analysis.fr.FrenchAnalyzer} with accent management 
*/ 
public class CustomFrenchAnalyzer extends Analyzer { 

    /** 
    * Lucene version 
    */ 
    private final Version matchVersion; 

    /** 
    * Constructs a new analyzer 
    * @param matchVersion compatibility version 
    */ 
    public CustomFrenchAnalyzer(final Version matchVersion) { 
     this.matchVersion = matchVersion; 
    } 

    @Override 
    protected final TokenStreamComponents createComponents(final String s, final Reader reader) { 
     final Tokenizer source = new StandardTokenizer(matchVersion, reader); 
     TokenStream result = new StandardFilter(matchVersion, source); 
     result = new ElisionFilter(result, FrenchAnalyzer.DEFAULT_ARTICLES); 
     result = new StopFilter(matchVersion, result, FrenchAnalyzer.getDefaultStopSet()); 
     result = new ASCIIFoldingFilter(result); 
     result = new LowerCaseFilter(matchVersion, result); 
     result = new FrenchLightStemFilter(result); 

     return new TokenStreamComponents(source, new LowerCaseFilter(matchVersion, result)); 
    } 
}