2014-04-09 41 views
1

我是elasticsearch的新手。 我想要实现span的功能,在精确的词组匹配和精确的词序列匹配之后,还要考虑到子串匹配。Elasticsearch:Span_near和子字符串匹配

例如:

文件我对指数:

  1. 男子霜
  2. 男士抗皱霜
  3. 男子先进的除皱霜
  4. 妇女霜
  5. 妇女抗皱霜
  6. women's advanc ED抗皱霜

如果我搜索“男人的精华”,我要得到相同的序列如上图所示。 预期的搜索结果:

  1. 男子霜 - >精确短语匹配
  2. 男士抗皱霜 - >与slop 1
  3. 男子的高级抗皱霜的搜索字词顺序 - >搜索词序列与slop 2
  4. 女士霜 - >子字符精确短语匹配
  5. 女士皱纹膏 - >子字符串搜索字词序列slop 1
  6. 妇女先进的除皱霜 - >搜索子项序列与slop 2

我能达到前3周的结果与span_near已经嵌套span_termsslop = 2in_order = true
我不能去实现它剩下的4至6,因为span_near是有嵌套span_terms不支持wildcard,在这个例子中“男人的奶油” OR“男子”。 有什么方法可以使用ELASTICSEARCH实现它?

最新通报
我的指数:

{ 
    "bluray": { 
    "settings": { 
     "index": { 
     "uuid": "4jofvNfuQdqbhfaF2ibyhQ", 
     "number_of_replicas": "1", 
     "number_of_shards": "5", 
     "version": { 
      "created": "1000199" 
     } 
     } 
    } 
    } 
} 

映射:

{ 
    "bluray": { 
    "mappings": { 
     "movies": { 
     "properties": { 
      "genre": { 
      "type": "string" 
      } 
     } 
     } 
    } 
    } 
} 

我运行下面的查询:

POST /bluray/movies/_search 
{ 
    "query": { 
    "bool": { 
     "should": [ 
     { 
      "span_near": { 
      "clauses": [ 
       { 
       "span_term": { 
        "genre": "women" 
       } 
       }, 
       { 
       "span_term": { 
        "genre": "cream" 
       } 
       } 
      ], 
      "collect_payloads": false, 
      "slop": 12, 
      "in_order": true 
      } 
     }, 
     { 
      "custom_boost_factor": { 
      "query": { 
       "match_phrase": { 
       "genre": "women cream" 
       } 
      }, 
      "boost_factor": 4.1 
      } 
     }, 
     { 
      "match": { 
      "genre": { 
       "query": "women cream", 
       "analyzer": "standard", 
       "minimum_should_match": "99%" 
      } 
      } 
     } 
     ] 
    } 
    } 
} 

这是给我下面的结果:

"took": 3, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 6, 
     "max_score": 0.011612939, 
     "hits": [ 
     { 
      "_index": "bluray", 
      "_type": "movies", 
      "_id": "u9aNkZAoR86uAiW9SX8szQ", 
      "_score": 0.011612939, 
      "_source": { 
       "genre": "men's cream" 
      } 
     }, 
     { 
      "_index": "bluray", 
      "_type": "movies", 
      "_id": "cpTyKrL6TWuJkXvliibVBQ", 
      "_score": 0.009290351, 
      "_source": { 
       "genre": "men's wrinkle cream" 
      } 
     }, 
     { 
      "_index": "bluray", 
      "_type": "movies", 
      "_id": "rn_SFvD4QBO6TJQJNuOh5A", 
      "_score": 0.009290351, 
      "_source": { 
       "genre": "men's advanced wrinkle cream" 
      } 
     }, 
     { 
      "_index": "bluray", 
      "_type": "movies", 
      "_id": "9a31_bRpR2WfWh_4fgsi_g", 
      "_score": 0.004618556, 
      "_source": { 
       "genre": "women's cream" 
      } 
     }, 
     { 
      "_index": "bluray", 
      "_type": "movies", 
      "_id": "q-DoBBl2RsON_qwLRSoh9Q", 
      "_score": 0.0036948444, 
      "_source": { 
       "genre": "women's advanced wrinkle cream" 
      } 
     }, 
     { 
      "_index": "bluray", 
      "_type": "movies", 
      "_id": "TxzCP8B_Q8epXtIcfgEw3Q", 
      "_score": 0.0036948444, 
      "_source": { 
       "genre": "women's wrinkle cream" 
      } 
     } 
     ] 
    } 
} 

这是不正确的。为什么当我搜索女性时会先搜索男性?

注意:搜索“男士霜”仍然会返回更好的结果,但不会遵循搜索词序列。

+0

我试图运用指标说明如下:http://stackoverflow.com/questions/9421358/filename-search-with-elasticsearch,但仍然没有以搜索字词顺序返回子字符串结果。我也用这里提供的要点 - > http://sense.qbox.io/gist/db82c3fca956c8bffae19559b1fe3108c101e851,这也没有给我想要的结果。 –

+0

你是否找到了解决方案?我也有同样的问题。 – letalumil

回答

0
POST /bluray/movies/_search 
{ 
    "query": { 
    "bool": { 
     "should": [ 
     { 
      "span_near": { 
      "clauses": [ 
       { 
       "span_term": { 
        "genre": "women's" 
       } 
       }, 
       { 
       "span_term": { 
        "genre": "cream" 
       } 
       } 
      ], 
      "collect_payloads": false, 
      "slop": 12, 
      "in_order": true 
      } 
     },{ 
      "match": { 
      "genre": { 
       "query": "women's cream", 
       "analyzer": "standard", 
       "minimum_should_match": "99%" 
      } 
      } 
     } 
     ] 
    } 
    } 
} 

这给下面的输出为您的预期:

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 6, 
    "max_score": 0.7841132, 
    "hits": [ 
     { 
     "_index": "bluray", 
     "_type": "movies", 
     "_id": "4", 
     "_score": 0.7841132, 
     "_source": { 
      "genre": "women's cream" 
     } 
     }, 
     { 
     "_index": "bluray", 
     "_type": "movies", 
     "_id": "5", 
     "_score": 0.56961054, 
     "_source": { 
      "genre": "women's wrinkle cream" 
     } 
     }, 
     { 
     "_index": "bluray", 
     "_type": "movies", 
     "_id": "6", 
     "_score": 0.35892165, 
     "_source": { 
      "genre": "women's advanced wrinkle cream" 
     } 
     }, 
     { 
     "_index": "bluray", 
     "_type": "movies", 
     "_id": "3", 
     "_score": 0.2876821, 
     "_source": { 
      "genre": "men's advanced wrinkle cream" 
     } 
     }, 
     { 
     "_index": "bluray", 
     "_type": "movies", 
     "_id": "1", 
     "_score": 0.25811607, 
     "_source": { 
      "genre": "men's cream" 
     } 
     }, 
     { 
     "_index": "bluray", 
     "_type": "movies", 
     "_id": "2", 
     "_score": 0.11750762, 
     "_source": { 
      "genre": "men's wrinkle cream" 
     } 
     } 
    ] 
    } 
}