大问题。花了一点点努力才发现,但我设法在ES 2.0中使用新的bucket selector aggregation。
我不得不时间戳更改为"integer"
类型得到它的工作(它将与日期以及工作,虽然)。
我创建了一个简单的指标,并用_bulk
要求加入你的数据:
PUT /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"timestamp": 0,"user":"mike","result":"failed"}
{"index":{"_id":2}}
{"timestamp": 1,"user":"anne","result":"failed"}
{"index":{"_id":3}}
{"timestamp": 2,"user":"bob","result":"success"}
{"index":{"_id":4}}
{"timestamp": 3,"user":"tom","result":"success"}
{"index":{"_id":5}}
{"timestamp": 4,"user":"jane","result":"failed"}
{"index":{"_id":6}}
{"timestamp": 5,"user":"anne","result":"success"}
{"index":{"_id":7}}
{"timestamp": 6,"user":"tom","result":"failed"}
{"index":{"_id":8}}
{"timestamp": 7,"user":"jane","result":"failed"}
{"index":{"_id":9}}
{"timestamp": 8,"user":"mike","result":"success"}
那么我给你所要求的(我认为)用下面的查询什么。下顶层"user_terms"
聚集,我可以设置三个子聚合:
"failed_filter"
选择具有"result": "failed"
文档,然后子聚合发现该组中的最大时间戳;
"success_filter"
选择具有"result": "success"
的文档,然后子聚合找到中的最大时间戳组;
- 最后,
"failed_lt_success_filter"
只选择那些文档针对附连到发生故障的值(最大)时间戳小于附连到成功值(最大)时间戳。
呼。
POST /test_index/_search
{
"size": 0,
"aggregations": {
"user_terms": {
"terms": {
"field": "user"
},
"aggs": {
"failed_filter": {
"filter": { "term": { "result": "failed" } },
"aggs": {
"max_timestamp": { "max": { "field": "timestamp" } }
}
},
"success_filter": {
"filter": { "term": { "result": "success" } },
"aggs": {
"max_timestamp": { "max": { "field": "timestamp" } }
}
},
"failed_lt_success_filter": {
"bucket_selector": {
"buckets_path": {
"failed_timestamp": "failed_filter.max_timestamp",
"success_timestamp": "success_filter.max_timestamp"
},
"script": "failed_timestamp < success_timestamp"
}
}
}
}
}
}
返回:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 0,
"hits": []
},
"aggregations": {
"user_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "anne",
"doc_count": 2,
"success_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 5
}
},
"failed_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 1
}
}
},
{
"key": "mike",
"doc_count": 2,
"success_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 8
}
},
"failed_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 0
}
}
}
]
}
}
}
下面是一些代码,我以前玩的问题:
http://sense.qbox.io/gist/06083e06191445a44610f32baf1dd45c7370401e
难道可以考虑,你有一个不同的域模型,其中每个用户有一个单独的文档和一个时间戳结果数组,如'{“user”:“mike”,“results”:[{“timestamp”:“t0”,“result”:“failed”}, {“timestamp”:“t8”,“result”:“success”}]}'?或者你是否绝对想为每个事件分散文件? – Val
我根本不依赖于领域模型 - 目前的结构在我们当前的数据处理方面更容易处理,但很高兴看到替代方案。您的建议结构将如何使用? – Andrew