2016-08-16 44 views
1

我想搜索的文本是2 marina blvd,通过elasticsearch(前3名)返回的结果是:elasticsearch同义词工作不正常

2 MARINA GREEN, SINGAPORE 019800 
MARINA BAYFRONT 2 RAFFLES LINK, SINGAPORE 039392 
THE SAIL @ MARINA BAY 2 MARINA BOULEVARD, SINGAPORE 018987 

在我的同义词名单,blvd是一样boulevard

当我搜索2 marina blvd,我期待这THE SAIL @ MARINA BAY 2 MARINA BOULEVARD, SINGAPORE 018987将是一个在与得分最高的顶部,因为2 marina blvd等于2 marina boulevard。但现在2 MARINA GREEN, SINGAPORE 019800是最高的。

问题出在哪里,我该如何改进搜索结果?

完整的设置是:

{ 
    "geolocation": { 
    "settings": { 
     "index": { 
     "creation_date": "1471322099847", 
     "analysis": { 
      "filter": { 
      "my_synonym_filter": { 
       "type": "synonym", 
       "synonyms": [ 
       "rd,road", 
       "ave,avenue", 
       "blvd,boulevard", 
       "st,street", 
       "lor,lorong", 
       "ter,terminal", 
       "blk,block", 
       "apt,apartment", 
       "condo,condominium" 
       ] 
      } 
      }, 
      "analyzer": { 
      "my_synonyms": { 
       "filter": [ 
       "lowercase", 
       "my_synonym_filter" 
       ], 
       "tokenizer": "standard" 
      }, 
      "stopwords_analyzer": { 
       "type": "standard", 
       "stopwords": [ 
       "the" 
       ] 
      }, 
      "my_ngram_analyzer": { 
       "tokenizer": "my_ngram_tokenizer" 
      } 
      }, 
      "tokenizer": { 
      "my_ngram_tokenizer": { 
       "token_chars": [ 
       "letter", 
       "digit" 
       ], 
       "min_gram": "2", 
       "type": "nGram", 
       "max_gram": "5" 
      } 
      } 
     }, 
     "number_of_shards": "5", 
     "number_of_replicas": "1", 
     "uuid": "mPfZmWHFQZOHqfAi471nGQ", 
     "version": { 
      "created": "2030599" 
     } 
     } 
    } 
    } 
} 

这是查询

body: { 
     from : 0, size : 10, 
     query: { 
     bool: { 
      should: [ 
      { 
       match: { 
       text: q 
       } 
      }, 
      { 
       match: { 
       text: { 
        query: q, 
        fuzziness: 1, 
        prefix_length: 0, 
        max_expansions: 100 
       } 
       } 
      }, 
      { 
       match: { 
       text: { 
        query: q, 
        max_expansions: 300, 
        type: "phrase_prefix" 
       } 
       } 
      } 
      ] 
     } 
     } 
    } 

和映射是:

{ 
    "geolocation": { 
    "mappings": { 
     "location": { 
     "properties": { 
      "address": { 
      "type": "string" 
      }, 
      "blk": { 
      "type": "string" 
      }, 
      "building": { 
      "type": "string" 
      }, 
      "location": { 
      "type": "geo_point" 
      }, 
      "postalCode": { 
      "type": "string" 
      }, 
      "road": { 
      "type": "string" 
      }, 
      "searchText": { 
      "type": "string" 
      }, 
      "x": { 
      "type": "string" 
      }, 
      "y": { 
      "type": "string" 
      } 
     } 
     } 
    } 
    } 
} 
+0

什么是查询? –

+0

和'text'字段的映射请 –

+0

@AndreiStefan更新 – Timeless

回答

1

您定义的分析仪,但你没有设置任何他们为你的领域。 最基本的设置是:

"searchText": { 
    "type": "string", 
    "analyzer":"my_synon‌​yms" 
} 

一个领域可以有一个分析仪分度时间,一个在搜索时间。大多数用例通常在索引和搜索时使用相同的分析器。默认情况下(使用"analyzer": "whatever_analyzer"‌​时)在搜索和索引时使用相同的分析器。

要获得更多洞察分析和你可以做什么,请咨询 https://www.elastic.co/guide/en/elasticsearch/guide/2.x/analysis-intro.html

+0

如何将同义词,拼音,停用词,ngram过滤器一起应用于一个字段,并限制一个字段只有一个过滤器。 – Timeless