ElasticSearch查询标签

我试图破解elasticsearch查询语言，到目前为止我没有做得很好。ElasticSearch查询标签

我对我的文档有下面的映射。

{ 
    "mappings": { 
     "jsondoc": { 
      "properties": { 
       "header" : { 
        "type" : "nested", 
        "properties" : { 
         "plainText" : { "type" : "string" }, 
         "title" : { "type" : "string" }, 
         "year" : { "type" : "string" }, 
         "pages" : { "type" : "string" } 
        } 
       }, 
       "sentences": { 
        "type": "nested", 
        "properties": { 
         "id": { "type": "integer" }, 
         "text": { "type": "string" }, 
         "tokens": { "type": "nested" }, 
         "rhetoricalClass": { "type": "string" }, 
         "babelSynsetsOcc": { 
          "type": "nested", 
          "properties" : { 
           "id" : { "type" : "integer" }, 
           "text" : { "type" : "string" }, 
           "synsetID" : { "type" : "string" } 
          } 
         } 
        } 
       } 
      } 
     } 
    } 
}

它主要类似于引用pdf文档的JSON文件。

我一直在尝试使用聚合进行查询，并且到目前为止效果很好。我已经到了（汇总）rhetoricalClass的分组点，得到的总重复次数为babelSynsetsOcc.synsetID。甚至相同的查询，甚至通过对整个结果进行分组，header.year

但是，现在，我正在努力过滤包含术语的文档并执行相同的查询。

那么，我该如何进行查询，以便按rhetoricalClass进行分组，并且只考虑其字段为header.plainText包含["Computational", "Compositional", "Semantics"]的那些文档。我的意思是contain而不是equal！

如果我做一个粗略的转换为SQL这将是类似的东西，以

SELECT count(sentences.babelSynsetsOcc.synsetID) 
FROM jsondoc 
WHERE header.plainText like '%Computational%' OR header.plainText like '%Compositional%' OR header.plainText like '%Sematics%' 
GROUP BY sentences.rhetoricalClass

来源

2016-06-23 Mayhem

WHERE条款只是标准的结构化查询，因此它们可转化为在Elasticsearch查询。

GROUP BY and HAVING松散地转换为Elasticsearch DSL中的聚合。像count,minmax和sum这样的函数是GROUP BY的函数，因此它也是一个聚合函数。

您使用nested对象的事实可能是必需的，但它会为触及它们的每个部分添加一个额外的图层。如果这些nested对象是而不是数组，则不要使用nested;在这种情况下使用object。

我可能会看翻译查询到：

{ 
    "query": { 
    "nested": { 
     "path": "header", 
     "query": { 
     "bool": { 
      "should": [ 
      { 
       "match": { 
       "header.plainText" : "Computational" 
       } 
      }, 
      { 
       "match": { 
       "header.plainText" : "Compositional" 
       } 
      }, 
      { 
       "match": { 
       "header.plainText" : "Semantics" 
       } 
      } 
      ] 
     } 
     } 
    } 
    } 
}

或者，它可以被改写，因为这，这是它的意图不那么明显：

{ 
    "query": { 
    "nested": { 
     "path": "header", 
     "query": { 
     "match": { 
      "header.plainText": "Computational Compositional Semantics" 
     } 
     } 
    } 
    } 
}

聚集会那么可以这样做：

{ 
    "aggs": { 
    "nested_sentences": { 
     "nested": { 
     "path": "sentences" 
     }, 
     "group_by_rhetorical_class": { 
     "terms": { 
      "field": "sentences.rhetoricalClass", 
      "size": 10 
     }, 
     "aggs": { 
      "nested_babel": { 
      "path": "sentences.babelSynsetsOcc" 
      }, 
      "aggs": { 
      "count_synset_id": { 
       "count": { 
       "field": "sentences.babelSynsetsOcc.synsetID" 
       } 
      } 
      } 
     } 
     } 
    } 
    } 
}

现在，如果你把它们合并在一起扔掉（因为你是j乌斯季寻找汇总结果），那么它看起来像这样：

{ 
    "size": 0, 
    "query": { 
    "nested": { 
     "path": "header", 
     "query": { 
     "match": { 
      "header.plainText": "Computational Compositional Semantics" 
     } 
     } 
    } 
    }, 
    "aggs": { 
    "nested_sentences": { 
     "nested": { 
     "path": "sentences" 
     }, 
     "group_by_rhetorical_class": { 
     "terms": { 
      "field": "sentences.rhetoricalClass", 
      "size": 10 
     }, 
     "aggs": { 
      "nested_babel": { 
      "path": "sentences.babelSynsetsOcc" 
      }, 
      "aggs": { 
      "count_synset_id": { 
       "count": { 
       "field": "sentences.babelSynsetsOcc.synsetID" 
       } 
      } 
      } 
     } 
     } 
    } 
    } 
}

来源

2016-06-23 18:11:14 pickypg

是的，你是完全正确的，我应该把更多的精力放在阅读文档，但它是这样一个痛苦的工作，阅读他们的官方文档。我的查询中唯一缺少的是嵌套过滤器，我不知道我怎么可能错过。总之，非常感谢你的贡献 – Mayhem

ElasticSearch查询标签

回答

相关问题