2017-04-27 223 views
1

我想知道是否有办法做类似于bucket_selector的事情,但是基于关键匹配而不是数字度量进行测试。Elasticsearch汇总聚合

为了让更多的背景下,这是我的使用情况:

数据样本:

[ 
    { 
    "@version": "1", 
    "@timestamp": "2017-04-27T04:28:23.589Z", 
    "type": "json", 
    "headers": { 
     "message": { 
     "type": "requestactivation" 
     } 
    }, 
    "id": "668" 
    }, 
    { 
    "@version": "1", 
    "@timestamp": "2017-04-27T04:32:23.589Z", 
    "type": "json", 
    "headers": { 
     "message": { 
     "type": "requestactivation" 
     } 
    }, 
    "id": "669" 
    }, 
    { 
    "@version": "1", 
    "@timestamp": "2017-04-27T04:30:00.802Z", 
    "type": "json", 
    "headers": { 
     "message": { 
     "type": "activationrequested" 
     } 
    }, 
    "id": "668" 
    } 
] 

我想检索所有的IDS在最后一个事件是requestactivation类型。

我已经有检索每个ID, 最后的事件类型的集合,但我还没有想出如何筛选基础上,重点

这桶是查询:

{ 
    "size": 0, 
    "query": { 
    "bool": { 
     "filter": [ 
     { 
      "exists": { 
      "field": "id" 
      } 
     }, 
     { 
      "terms": { 
      "headers.message.type": [ 
       "requestactivation", 
       "activationrequested" 
      ] 
      } 
     } 
     ] 
    } 
    }, 
    "aggs": { 
    "id": { 
     "terms": { 
     "field": "id", 
     "size": 10000 
     }, 
     "aggs": { 
     "latest": { 
      "max": { 
      "field": "@timestamp" 
      } 
     }, 
     "hmtype": { 
      "terms": { 
      "field": "headers.message.type", 
      "size": 1 
      } 
     } 
     } 
    } 
    } 
} 

下面是结果样品:

{ 
    "took": 5, 
    "timed_out": false, 
    "_shards": { 
    "total": 3, 
    "successful": 3, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 3, 
    "max_score": 0, 
    "hits": [] 
    }, 
    "aggregations": { 
    "id": { 
     "doc_count_error_upper_bound": 3, 
     "sum_other_doc_count": 46, 
     "buckets": [ 
     { 
      "key": "986", 
      "doc_count": 4, 
      "hmtype": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 2, 
      "buckets": [ 
       { 
       "key": "activationrequested", 
       "doc_count": 2 
       } 
      ] 
      }, 
      "latest": { 
      "value": 1493238253603, 
      "value_as_string": "2017-04-26T20:24:13.603Z" 
      } 
     }, 
     { 
      "key": "967", 
      "doc_count": 2, 
      "hmtype": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 1, 
      "buckets": [ 
       { 
       "key": "requestactivation", 
       "doc_count": 1 
       } 
      ] 
      }, 
      "latest": { 
      "value": 1493191161242, 
      "value_as_string": "2017-04-26T07:19:21.242Z" 
      } 
     }, 
     { 
      "key": "554", 
      "doc_count": 7, 
      "hmtype": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 5, 
      "buckets": [ 
       { 
       "key": "requestactivation", 
       "doc_count": 5 
       } 
      ] 
      }, 
      "latest": { 
      "value": 1493200196871, 
      "value_as_string": "2017-04-26T09:49:56.871Z" 
      } 
     } 
     ] 
    } 
    } 
} 

所有映射不分析(关键字)。

目标是将结果减少到只有桶中的关键字为“requestactivation”的结果。

无法使用文档计数,因为activationrequest可能会多次出现在id中。

最近才开始钻研聚合,所以如果问题看起来很明显,那么道歉,周围的例子似乎不符合这个特定的逻辑。

回答

1

如何在terms聚集用于include包括在术语“过滤器”的值,只有相关的请求:

{ 
    "size": 0, 
    "query": { 
    "bool": { 
     "filter": [ 
     { 
      "exists": { 
      "field": "id" 
      } 
     }, 
     { 
      "terms": { 
      "headers.message.type": [ 
       "requestactivation", 
       "activationrequested" 
      ] 
      } 
     } 
     ] 
    } 
    }, 
    "aggs": { 
    "id": { 
     "terms": { 
     "field": "id", 
     "size": 10000 
     }, 
     "aggs": { 
     "latest": { 
      "max": { 
      "field": "@timestamp" 
      } 
     }, 
     "hmtype": { 
      "filter": { 
      "terms": { 
       "headers.message.type": [ 
       "requestactivation", 
       "activationrequested" 
       ] 
      } 
      }, 
      "aggs": { 
      "count_types": { 
       "cardinality": { 
       "field": "headers.message.type" 
       } 
      } 
      } 
     }, 
     "filter_buckets": { 
      "bucket_selector": { 
      "buckets_path": { 
       "totalTypes":"hmtype > count_types" 
      }, 
      "script": "params.totalTypes == 2" 
      } 
     } 
     } 
    } 
    } 
} 
+0

我可能失去了一些东西,但在测试了所提出的包括我结束与所有具有“activationrequested”事件的id(从您的示例中,我实际上正在寻找“requestactivation”),这是否id具有其他类型的事件。 – Olivier

+0

我的不好,应该是“include”:“requestactivation”......但我觉得在路上有一些限制。 –

+0

但包含基本上行为相同的方式,如果我已经过滤了查询中的激活请求的**事件**(因为我不关心每个说的查询命中)。而我想过滤掉** ids **,其中收到了激活请求。 – Olivier