Elasticsearch平均值日期直方图桶

我有一大堆的ElasticSearch索引文件，我需要得到以下数据：Elasticsearch平均值日期直方图桶

对于每个月，每拿到的工作日的文件平均数月（或者如果不可能，则使用20天作为默认值）。

我已经使用date histogram汇总将我的数据汇总到数据桶中。我尝试嵌套stats存储分区，但是此聚合使用的是从文档字段中提取的数据，而不是从父桶中提取的数据。

这是迄今为止我的查询：

{ 
    "query": { 
     "match_all": {} 
    }, 
    "aggs": { 
     "docs_per_month": { 
      "date_histogram": { 
       "field": "created_date", 
       "interval": "month", 
       "min_doc_count": 0 
      } 
      "aggs": { 
       '???': '???' 
      } 
     } 
    } 
}

编辑

为了使我的问题更清楚，我需要的是：

得到总创建的文档数（已经完成了感谢date_histogram汇总）
获取当月的工作日数
将第一个除以第二个。

来源

2015-06-11 Thibault J

明确需要更新我的个人资料... –

什么，你基本上需要的是这样的事情（不工作，因为它不是一个可用功能）：

{ 
    "query": { 
    "match_all": {} 
    }, 
    "aggs": { 
    "docs_per_month": { 
     "date_histogram": { 
     "field": "date", 
     "interval": "month", 
     "min_doc_count": 0 
     }, 
     "aggs": { 
     "average": { 
      "avg": { 
      "script": "doc_count/20" 
      } 
     } 
     } 
    } 
    } 
}

它不工作，因为没有从“父”聚集访问doc_count的方式。

但是，这将在Elasticsearch的2.x分支中成为可能，并且目前正在积极开发：https://github.com/elastic/elasticsearch/issues/8110 这个新功能将对聚合的结果（桶）添加第二层操作这不仅是你的用例，还有其他许多用例。

除非您想尝试some ideas out there或在您的应用中执行自己的计算，否则您需要等待此功能。

来源

2015-06-15 08:50:28

要排除与时间戳的文件上周六和周日，这样你就可以使用脚本

{ 
    "query": { 
    "filtered": { 
     "filter": { 
     "script": { 
      "script": "doc['@timestamp'].date.dayOfWeek != 7 && doc['@timestamp'].date.dayOfWeek != 6" 
     } 
     } 
    } 
    }, 
    "aggs": { 
    "docs_per_month": { 
     "date_histogram": { 
     "field": "created_date", 
     "interval": "month", 
     "min_doc_count": 0 
     }, 
     "aggs": { 
     "docs_per_day": { 
      "date_histogram": { 
      "field": "created_date", 
      "interval": "day", 
      "min_doc_count": 0 
      } 
     }, 
     "aggs": { 
      "docs_count": { 
      "avg": { 
       "field": "" 
      } 
      } 
     } 
     } 
    } 
    } 
}

你可能不通过每月需要第一聚集排除在查询这些文件，因为你用一天的时间间隔已经有这个信息

BTW，你需要确保动态脚本是通过添加以下内容到elasticsearch.yml配置启用

script.disable_dynamic: false

或在/配置/脚本添加一个Groovy脚本，并在过滤器中使用过滤查询与脚本

来源

2015-06-11 15:39:40

Thx为您的答案。但是，我不想只计算在工作日创建的文档，我需要统计当月的所有文档（我已经完成），然后除以工作日的数量。我不知道的是我如何计算这个数字（本月的工作日）。 –

我会编辑我的问题，因为我意识到这可能会引起误解。 –

有一个相当复杂的解决方案，并没有真正的高性能，使用以下scripted_metric aggregation。

{ 
    "size": 0, 
    "query": { 
    "match_all": {} 
    }, 
    "aggs": { 
    "docs_per_month": { 
     "date_histogram": { 
     "field": "created_date", 
     "interval": "month", 
     "min_doc_count": 0 
     }, 
     "aggs": { 
     "avg_doc_per_biz_day": { 
      "scripted_metric": { 
      "init_script": "_agg.bizdays = []; _agg.allbizdays = [:]; start = new DateTime(1970, 1, 1, 0, 0); now = new DateTime(); while (start < now) { def end = start.plusMonths(1); _agg.allbizdays[start.year + '_' + start.monthOfYear] = (start.toDate()..<end.toDate()).sum {(it.day != 6 && it.day != 0) ? 1 : 0 }; start = end; }", 
      "map_script": "_agg.bizdays << _agg.allbizdays[doc. created_date.date.year+'_'+doc. created_date.date.monthOfYear]", 
      "combine_script": "_agg.allbizdays = null; doc_count = 0; for (d in _agg.bizdays){ doc_count++ }; return doc_count/_agg.bizdays[0]", 
      "reduce_script": "res = 0; for (a in _aggs) { res += a }; return res" 
      } 
     } 
     } 
    } 
    } 
}

让我们来详细介绍下面的每个脚本。

我在做什么在init_script是创建地图工作日每个月的数量自1970年以来和存储，在_agg.allbizdays地图。

_agg.bizdays = []; 
_agg.allbizdays = [:]; 
start = new DateTime(1970, 1, 1, 0, 0); 
now = new DateTime(); 
while (start < now) { 
    def end = start.plusMonths(1);  
    _agg.allbizdays[start.year + '_' + start.monthOfYear] = (start.toDate()..<end.toDate()).sum {(it.day != 6 && it.day != 0) ? 1 : 0 }; 
    start = end; 
}

在map_script，我只是平日检索每个文档的月份数;

_agg.bizdays << _agg.allbizdays[doc.created_date.date.year + '_' + doc. created_date.date.monthOfYear];

在combine_script，我总结的平均文档数为每个碎片

_agg.allbizdays = null; 
doc_count = 0; 
for (d in _agg.bizdays){ doc_count++ }; 
return doc_count/_agg.bizdays[0];

在 reduce_script

最后，我总结的平均文档数为每个节点：

res = 0; 
for (a in _aggs) { res += a }; 
return res

再一次，我认为它非常复杂，而且正如Andrei所说的那样，最好等待2.0让它按照它应该的方式工作，但是在此期间，如果你需要的话。

来源

2015-06-15 09:39:56 Val

Elasticsearch平均值日期直方图桶

回答

相关问题