我正在多个字段上执行一个字段的query_string查询,_all
和tags.name
,并试图理解评分。查询:{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}
。下面是查询返回的文件:为什么在同一个查询中queryWeight包含某些结果分数,但不包含其他分数?
- 文件1对
tags.name
完全匹配,但不是在_all
。 - 文档8在
tags.name
和_all
上有完全匹配。
文件8应该赢了,它确实如此,但我对打分的结果感到困惑。看起来像文档1被tags.name
分数乘以两次IDF而受到处罚,而文档8的tags.name
分数只乘以一次IDF。总之:
- 他们都有一个组件
weight(tags.name:animal in 0) [PerFieldSimilarity]
。 - 在文档1中,我们有
weight = score = queryWeight x fieldWeight
。 - 在文件8中,我们有
weight = fieldWeight
!
由于queryWeight
包含idf
,这导致文档1被idf两次惩罚。
任何人都可以理解这一点吗?
信息
- 如果我删除从查询的字段
_all
,queryWeight
完全从解释了。 - 添加
"use_dis_max":true
作为选项没有效果。- 然而,另外加入
"tie_breaker":0.7
(或任何值)确实通过给它的更复杂的公式,我们在文献看到1. - 思想影响文献8:这是合理的,一个布尔查询(此是)可能会这样做是为了给予与多个子查询匹配的查询更多的权重。然而,这对dis_max查询没有任何意义,它应该只返回最大的子查询。
- 然而,另外加入
下面是相关的解释请求。寻找嵌入式评论。
文献1(匹配仅在tags.name
):
curl -XGET 'http://localhost:9200/questions/question/1/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}'
:
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.058849156,
"description" : "max of:",
"details" : [ {
"value" : 0.058849156,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = score = queryWeight x fieldWeight
"details" : [ {
// score and queryWeight are NOT a part of the other explain!
"value" : 0.058849156,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [ {
"value" : 0.30685282,
"description" : "queryWeight, product of:",
"details" : [ {
// This idf is NOT a part of the other explain!
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "queryNorm"
} ]
}, {
"value" : 0.19178301,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
} ]
}
文献8(在两个_all
和tags.name
匹配):
curl -XGET 'http://localhost:9200/questions/question/8/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}'
:
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "8",
"matched" : true,
"explanation" : {
"value" : 0.15342641,
"description" : "max of:",
"details" : [ {
"value" : 0.033902764,
"description" : "btq, product of:",
"details" : [ {
"value" : 0.033902764,
"description" : "weight(_all:anim in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.033902764,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 0.70710677,
"description" : "tf(freq=0.5), with freq of:",
"details" : [ {
"value" : 0.5,
"description" : "phraseFreq=0.5"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.15625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}, {
"value" : 1.0,
"description" : "allPayload(...)"
} ]
}, {
"value" : 0.15342641,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = fieldWeight
// No score or queryWeight in sight!
"details" : [ {
"value" : 0.15342641,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
}
}
嗨,你自己找到答案了吗?或者你有任何来源去学习?我正在遭受同样的缺乏理解。在我们的案例中,这会对一些点击产生不利影响,并且我需要了解为什么以及如何调整我们的查询。 – Jakub
不,我从来没有找到一个答案,不幸的是,好奇看到你听到回来。 – tmandry