2016-06-24 32 views
1

我有类似下面的文档的索引:多场方面聚集的做法

[ 
    { 
     "name": "Marco", 
     "city_id": 45, 
     "city": "Rome" 
    }, 
    { 
     "name": "John", 
     "city_id": 46, 
     "city": "London" 
    }, 
    { 
     "name": "Ann", 
     "city_id": 47, 
     "city": "New York" 
    }, 
    ... 
] 

和聚合:

"aggs": { 
    "city": { 
     "terms": { 
      "field": "city" 
     } 
    } 
} 

这给了我这样的回应:

{ 
    "aggregations": {  
     "city": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 694, 
      "buckets": [ 
       { 
        "key": "Rome", 
        "doc_count": 15126 
       }, 
       { 
        "key": "London", 
        "doc_count": 11395 
       }, 
       { 
        "key": "New York", 
        "doc_count": 14836 
       }, 
       ... 
      ] 
     }, 
     ... 
    } 
} 

我的问题是我需要在我的聚合结果上也有city_id。我一直在阅读here,我无法使用多场术语聚合,但我不需要通过两个字段进行聚合,而只是返回另一个字段,该字段对于每个术语字段(基本上都是city/city_id对) )。在不损失业绩的情况下,实现这一目标的最佳方式是什么?

我可以创建一个名为city_with_id的字段,其值为"Rome;45","London;46"等,并按此字段进行聚合。对我来说,这是可行的,因为我可以简单地将结果分解到我的后端并获得我需要的ID,但这是否是最好的方法?

回答

1

一种方法是使用top_hits并使用源过滤仅返回city_id,如下例所示。 我不认为这会导致性能降低 您可以在尝试使用OP中指定的city_name_id字段的方法之前,在您的索引中尝试使用它来查看影响。

例子:

post <index>/_search 
    { 
     "size" : 0, 
     "aggs": { 
      "city": { 
       "terms": { 
        "field": "city" 
       }, 
       "aggs" : { 
        "id" : { 
         "top_hits" : { 
          "_source": { 
           "include": [ 
            "city_id" 
           ] 
          }, 
          "size" : 1 
         } 
        } 
       } 
      } 
     } 
    } 

结果:

{ 
       "key": "London", 
       "doc_count": 2, 
       "id": { 
        "hits": { 
        "total": 2, 
        "max_score": 1, 
        "hits": [ 
         { 
          "_index": "country", 
          "_type": "city", 
          "_id": "2", 
          "_score": 1, 
          "_source": { 
           "city_id": 46 
          } 
         } 
        ] 
        } 
       } 
      }, 
      { 
       "key": "New York", 
       "doc_count": 1, 
       "id": { 
        "hits": { 
        "total": 1, 
        "max_score": 1, 
        "hits": [ 
         { 
          "_index": "country", 
          "_type": "city", 
          "_id": "3", 
          "_score": 1, 
          "_source": { 
           "city_id": 47 
          } 
         } 
        ] 
        } 
       } 
      }, 
      { 
       "key": "Rome", 
       "doc_count": 1, 
       "id": { 
        "hits": { 
        "total": 1, 
        "max_score": 1, 
        "hits": [ 
         { 
          "_index": "country", 
          "_type": "city", 
          "_id": "1", 
          "_score": 1, 
          "_source": { 
           "city_id": 45 
          } 
         } 
        ] 
        } 
       } 
      } 
+0

它的工作!其实我已经失去了相当多的时间使用这种方法,因为我的例子只是说明 - 在真实场景中,我有很多字段需要应用嵌套聚合,结果是不可接受的。无论如何,它的工作,我会接受你的答案。非常感谢你! – stefanobaldo