ElasticSearch评分问题

我试图找出ElasticSearch在按分数对结果进行排名时使用的逻辑。ElasticSearch评分问题

我一共有4个索引。我正在查询所有索引的任期。我使用的查询如下 -

GET /_all/static/_search 
{ 
    "query": { 
    "match": { 
     "name": "chinese" 
    } 
    } 
}

（部分）响应，我得到的是如下 -

{ 
    "took": 17, 
    "timed_out": false, 
    "_shards": { 
     "total": 40, 
     "successful": 40, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 6, 
     "max_score": 2.96844, 
     "hits": [ 
     { 
      "_shard": 1, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "restaurant", 
      "_type": "static", 
      "_id": "XecLkyYNQWihuR2atFc5JQ", 
      "_score": 2.96844, 
      "_source": { 
       "name": "Just Chinese" 
      }, 
      "_explanation": { 
       "value": 2.96844, 
       "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.96844, 
        "description": "fieldWeight in 1, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 4.749504, 
          "description": "idf(docFreq=3, maxDocs=170)" 
         }, 
         { 
          "value": 0.625, 
          "description": "fieldNorm(doc=1)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 
     { 
      "_shard": 1, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "restaurant", 
      "_type": "static", 
      "_id": "IAUpkC55ReySjvl9Xr5MVw", 
      "_score": 2.96844, 
      "_source": { 
       "name": "The Chinese Hut" 
      }, 
      "_explanation": { 
       "value": 2.96844, 
       "description": "weight(name:chinese in 5) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.96844, 
        "description": "fieldWeight in 5, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 4.749504, 
          "description": "idf(docFreq=3, maxDocs=170)" 
         }, 
         { 
          "value": 0.625, 
          "description": "fieldNorm(doc=5)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 
     { 
      "_shard": 2, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "cuisine", 
      "_type": "static", 
      "_id": "6", 
      "_score": 2.7047482, 
      "_source": { 
       "name": "Chinese" 
      }, 
      "_explanation": { 
       "value": 2.7047482, 
       "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.7047482, 
        "description": "fieldWeight in 1, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 2.7047482, 
          "description": "idf(docFreq=1, maxDocs=11)" 
         }, 
         { 
          "value": 1, 
          "description": "fieldNorm(doc=1)" 
         } 
        ] 
        } 
       ] 
      } 
     },

我的问题是 - 我。据了解，elasticsearch对待小那么为什么餐厅指数中的“Just Chinese”和“The Chinese Hut”这样的结果与美食指数的预期最佳匹配“chinese”相比排在前列？据我所知，在将这些文档插入索引时，我没有使用任何特殊的分析器或任何东西。一切都是默认的。

我错过了什么，如何获得预期的结果？

来源

2014-04-13 arijeet

计算得分的重要参数之一是inverse document frequency（IDF）。默认情况下，elasticsearch的每个分片尝试根据本地IDF估计全局IDF。它有很多类似的记录均匀分布在分片中。但是，如果您只有几条记录，或者将多个碎片的结果与不同类型的记录（餐厅名称和餐厅名称）结合起来，估计IDF可能会产生奇怪的结果。此问题的解决方案是使用弹性搜索的dfs_query_then_fetch搜索模式。

顺便说一下，为了解弹性搜索如何计算得分，您可以在搜索请求或网址中使用explain参数。因此，当您提出关于评分的问题时，当您提供解释设置为true的输出时，这会有所帮助。

来源

2014-04-14 01:07:44 imotov

dfs_query_then_fetch工作！现在我也明白为什么它会这样工作！感谢您的解释！另外，我编辑了回复以包含原始回复的解释。 – arijeet

ElasticSearch评分问题

回答

相关问题