2014-03-25 16 views
0

NEST不会出现在这里描述的pattern replace char filter支持:工作周围NEST不支持图案更换炭过滤器

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

我在https://github.com/elasticsearch/elasticsearch-net/issues/543所造成的问题。

我的大部分索引都在工作,所以我想继续使用NEST。有没有一种方法可以解决这个问题:在索引配置期间的某个时候使用手动json注入?我是NEST的新手,所以不确定这是否可行。

具体而言,我希望使用pattern replace char filter从街道地址删除单元号码它们通过自定义分析器运行之前(即#205 - 1260百老汇变得1260百老汇)。由于自定义分析器,我相信我需要使用这个字符过滤器来实现这一点。

我现在的配置是这样的:

elasticClient.CreateIndex("geocoding", c => c 
      .Analysis(ad => ad 
       .Analyzers(ab => ab 
        .Add("address-index", new CustomAnalyzer() 
        { 
         Tokenizer = "whitespace", 
         Filter = new List<string>() { "lowercase", "synonym" } 
        }) 
        .Add("address-search", new CustomAnalyzer() 
        { 
         Tokenizer = "whitespace", 
         Filter = new List<string>() { "lowercase" }, 
         CharFilter = new List<string>() { "drop-unit" } 
        }) 
       ) 
       .CharFilters(cfb => cfb 
        .Add("drop-unit", new CharFilter()) //missing char filter here 
       ) 
       .TokenFilters(tfb => tfb 
        .Add("synonym", new SynonymTokenFilter() 
        { 
         Expand = true, 
         SynonymsPath = "analysis/synonym.txt" 
        }) 
       ) 
      ) 

UPDATE

截至2014年五月,NEST现在支持pattern replace char filterhttps://github.com/elasticsearch/elasticsearch-net/pull/637

回答

1

相反过程中使用流利设置你的创建索引时,可以使用Settings.Add方法以更加手动的方式添加到FluentDictionary,但可以完全控制setti ngs被传入。NEST DocumentationCreate Index中显示了一个示例。我使用这种方法的原因非常类似。

你的配置将类似于下面的内容:

elasticClient.CreateIndex("geocoding", c => c. 
     .Settings(s => s. 
      .Add("analysis.analyzer.address-index.type", "custom") 
      .Add("analysis.analyzer.address-index.tokenizer", "whitespace") 
      .Add("analysis.analyzer.address-index.filter.0", "lowercase") 
      .Add("analysis.analyzer.address-index.filter.1", "synonym") 
      .Add("anaylsis.analyzer.address-search.type", "custom") 
      .Add("analysis.analyzer.address-search.tokenizer", "whitespace") 
      .Add("analysis.analyzer.address-search.filter.0", "lowercase") 
      .Add("analysis.analyzer.address-search.char_filter.0", "drop-unit") 
      .Add("analysis.char_filter.drop-unit.type", "mapping") 
      .Add("analysis.char_filter.drop-unit.mappings.0", "<mapping1>") 
      .Add("analysis.char_filter.drop-unit.mappings.1", "<mapping2>") 
      ... 
     ) 
); 

您将需要更换<mapping1><mapping2>以上,你想用你的实际char_filter映射。请注意,我之前没有使用过char_filter,所以设置值可能有点偏离,但应该让你朝着正确的方向前进。

0

只是为了提供Paige的非常有用的答案的后续行动,它看起来像你可以结合流利和手工Settings.Add方法。以下为我工作:

 elasticClient.CreateIndex("geocoding", c => c 
      .Settings(s => s 
       .Add("analysis.char_filter.drop_unit.type", "pattern_replace") 
       .Add("analysis.char_filter.drop_unit.pattern", @"#\d+\s-\s") 
       .Add("analysis.char_filter.drop_unit.replacement", "") 
      ) 
      .Analysis(ad => ad 
       .Analyzers(ab => ab 
        .Add("address_index", new CustomAnalyzer() 
        { 
         Tokenizer = "whitespace", 
         Filter = new List<string>() { "lowercase", "synonym" } 
        }) 
        .Add("address_search", new CustomAnalyzer() 
        { 
         CharFilter = new List<string> { "drop_unit" }, 
         Tokenizer = "whitespace", 
         Filter = new List<string>() { "lowercase" } 
        }) 
       ) 
       .TokenFilters(tfb => tfb 
        .Add("synonym", new SynonymTokenFilter() 
        { 
         Expand = true, 
         SynonymsPath = "analysis/synonym.txt" 
        }) 
       ) 
      ) 
+0

不错!很高兴知道这些可以结合。感谢后续。 –

+0

这是如何在实际查询中使用的?我还没有看到一个很好的例子。 – bigerock

0
 EsClient.CreateIndex("universal_de", c => c 
     .NumberOfReplicas(1) 
     .NumberOfShards(5) 
     .Settings(s => s //just as an example 
      .Add("merge.policy.merge_factor", "10") 
      .Add("search.slowlog.threshold.fetch.warn", "1s") 
      .Add("analysis.char_filter.drop_chars.type", "pattern_replace") 
      .Add("analysis.char_filter.drop_chars.pattern", @"[^0-9]") 
      .Add("analysis.char_filter.drop_chars.replacement", "") 
      .Add("analysis.char_filter.drop_specChars.type", "pattern_replace") 
      .Add("analysis.char_filter.drop_specChars.pattern", @"[^0-9a-zA-Z]") 
      .Add("analysis.char_filter.drop_specChars.replacement", "") 
     ) 
     .Analysis(descriptor => descriptor 
      .Analyzers(bases => bases 
       .Add("folded_word", new CustomAnalyzer() 
       { 
        Filter = new List<string> { "lowercase", "asciifolding", "trim" }, 
        Tokenizer = "standard" 
       } 
       ) 
       .Add("trimmed_number", new CustomAnalyzer() 
       { 
        CharFilter = new List<string> { "drop_chars" }, 
        Tokenizer = "standard", 
        Filter = new List<string>() { "lowercase" } 
       }) 
       .Add("trimmed_specChars", new CustomAnalyzer() 
       { 
        CharFilter = new List<string> { "drop_specChars" }, 
        Tokenizer = "standard", 
        Filter = new List<string>() { "lowercase" } 
       }) 
      ) 
     ) 
      .AddMapping<Business>(m => m 
       //.MapFromAttributes() 
       .Properties(props => props 
        .MultiField(mf => mf 
         .Name(t => t.DirectoryName) 
         .Fields(fs => fs 
          .String(s => s.Name(t => t.DirectoryName).Analyzer("standard")) 
          .String(s => s.Name(t => t.DirectoryName.Suffix("folded")).Analyzer("folded_word")) 
          ) 
        ) 
        .MultiField(mf => mf 
         .Name(t => t.Phone) 
         .Fields(fs => fs 
          .String(s => s.Name(t => t.Phone).Analyzer("trimmed_number")) 
          ) 
        ) 

这是你如何创建索引并添加映射。 我们搜索我有这样的事情:

var result = _Instance.Search<Business>(q => q 
       .TrackScores(true) 
       .Query(qq => 
       { 
        QueryContainer termQuery = null; 
       if (!string.IsNullOrWhiteSpace(input.searchTerm)) 
       { 
        var toLowSearchTerm = input.searchTerm.ToLower(); 
        termQuery |= qq.QueryString(qs => qs 
         .OnFieldsWithBoost(f => f 
          .Add("directoryName.folded", 5.0) 
         ) 
         .Query(toLowSearchTerm)); 
         termQuery |= qq.Fuzzy(fz => fz.OnField("directoryName.folded").Value(toLowSearchTerm).MaxExpansions(2)); 
         termQuery |= qq.Term("phone", Regex.Replace(toLowSearchTerm, @"[^0-9]", "")); 


       } 

       return termQuery; 
       }) 
       .Skip(input.skip) 
       .Take(input.take) 
      ); 

新:我设法使用该模式在这样一种更好的方式替代:

.Analysis(descriptor => descriptor 
     .Analyzers(bases => bases 
      .Add("folded_word", new CustomAnalyzer() 
      { 
       Filter = new List<string> { "lowercase", "asciifolding", "trim" }, 
       Tokenizer = "standard" 
      } 
      ) 
      .Add("trimmed_number", new CustomAnalyzer() 
      { 
       CharFilter = new List<string> { "drop_chars" }, 
       Tokenizer = "standard", 
       Filter = new List<string>() { "lowercase" } 
      }) 
      .Add("trimmed_specChars", new CustomAnalyzer() 
      { 
       CharFilter = new List<string> { "drop_specChars" }, 
       Tokenizer = "standard", 
       Filter = new List<string>() { "lowercase" } 
      }) 
      .Add("autocomplete", new CustomAnalyzer() 
      { 
       Tokenizer = new WhitespaceTokenizer().Type, 
       Filter = new List<string>() { "lowercase", "asciifolding", "trim", "engram" } 
      } 
      ) 
    ) 
    .TokenFilters(i => i 
       .Add("engram", new EdgeNGramTokenFilter 
        { 
         MinGram = 3, 
         MaxGram = 15 
        } 
       ) 
    ) 
    .CharFilters(cf => cf 
       .Add("drop_chars", new PatternReplaceCharFilter 
        { 
         Pattern = @"[^0-9]", 
         Replacement = "" 
        } 
       ) 
       .Add("drop_specChars", new PatternReplaceCharFilter 
       { 
        Pattern = @"[^0-9a-zA-Z]", 
        Replacement = "" 
       } 
       ) 
    ) 
    ) 
+2

你能解释这是如何工作的吗? – rene

+0

这不提供问题的答案。要批评或要求作者澄清,在他们的帖子下留下评论 - 你总是可以评论你自己的帖子,一旦你有足够的[声誉](http://stackoverflow.com/help/whats-reputation),你会能够[评论任何帖子](http://stackoverflow.com/help/privileges/comment)。 – AstroCB

+0

你不明白的部分? – danvasiloiu