2014-04-13 49 views
0

我试图找出ElasticSearch在按分数对结果进行排名时使用的逻辑。ElasticSearch评分问题

我一共有4个索引。我正在查询所有索引的任期。我使用的查询如下 -

GET /_all/static/_search 
{ 
    "query": { 
    "match": { 
     "name": "chinese" 
    } 
    } 
} 

(部分)响应,我得到的是如下 -

{ 
    "took": 17, 
    "timed_out": false, 
    "_shards": { 
     "total": 40, 
     "successful": 40, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 6, 
     "max_score": 2.96844, 
     "hits": [ 
     { 
      "_shard": 1, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "restaurant", 
      "_type": "static", 
      "_id": "XecLkyYNQWihuR2atFc5JQ", 
      "_score": 2.96844, 
      "_source": { 
       "name": "Just Chinese" 
      }, 
      "_explanation": { 
       "value": 2.96844, 
       "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.96844, 
        "description": "fieldWeight in 1, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 4.749504, 
          "description": "idf(docFreq=3, maxDocs=170)" 
         }, 
         { 
          "value": 0.625, 
          "description": "fieldNorm(doc=1)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 
     { 
      "_shard": 1, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "restaurant", 
      "_type": "static", 
      "_id": "IAUpkC55ReySjvl9Xr5MVw", 
      "_score": 2.96844, 
      "_source": { 
       "name": "The Chinese Hut" 
      }, 
      "_explanation": { 
       "value": 2.96844, 
       "description": "weight(name:chinese in 5) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.96844, 
        "description": "fieldWeight in 5, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 4.749504, 
          "description": "idf(docFreq=3, maxDocs=170)" 
         }, 
         { 
          "value": 0.625, 
          "description": "fieldNorm(doc=5)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 
     { 
      "_shard": 2, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "cuisine", 
      "_type": "static", 
      "_id": "6", 
      "_score": 2.7047482, 
      "_source": { 
       "name": "Chinese" 
      }, 
      "_explanation": { 
       "value": 2.7047482, 
       "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.7047482, 
        "description": "fieldWeight in 1, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 2.7047482, 
          "description": "idf(docFreq=1, maxDocs=11)" 
         }, 
         { 
          "value": 1, 
          "description": "fieldNorm(doc=1)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 

我的问题是 - 我。据了解,elasticsearch对待小那么为什么餐厅指数中的“Just Chinese”和“The Chinese Hut”这样的结果与美食指数的预期最佳匹配“chinese”相比排在前列?据我所知,在将这些文档插入索引时,我没有使用任何特殊的分析器或任何东西。一切都是默认的。

我错过了什么,如何获得预期的结果?

回答

2

计算得分的重要参数之一是inverse document frequency(IDF)。默认情况下,elasticsearch的每个分片尝试根据本地IDF估计全局IDF。它有很多类似的记录均匀分布在分片中。但是,如果您只有几条记录,或者将多个碎片的结果与不同类型的记录(餐厅名称和餐厅名称)结合起来,估计IDF可能会产生奇怪的结果。此问题的解决方案是使用弹性搜索的dfs_query_then_fetch搜索模式。

顺便说一下,为了解弹性搜索如何计算得分,您可以在搜索请求或网址中使用explain参数。因此,当您提出关于评分的问题时,当您提供解释设置为true的输出时,这会有所帮助。

+0

dfs_query_then_fetch工作!现在我也明白为什么它会这样工作!感谢您的解释! 另外,我编辑了回复以包含原始回复的解释。 – arijeet