Elasticsearch fielddata - 我应该使用它吗？

鉴于索引中包含品牌属性的文档，我们需要创建一个不区分大小写的词汇聚合。Elasticsearch fielddata - 我应该使用它吗？

指数定义

请注意，使用的fielddata

PUT demo_products 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "my_custom_analyzer": { 
      "type": "custom", 
      "tokenizer": "keyword", 
      "filter": [ 
      "lowercase" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "product": { 
     "properties": { 
     "brand": { 
      "type": "text", 
      "analyzer": "my_custom_analyzer", 
      "fielddata": true, 
     } 
     } 
    } 
    } 
}

数据

POST demo_products/product 
{ 
    "brand": "New York Jets" 
} 

POST demo_products/product 
{ 
    "brand": "new york jets" 
} 

POST demo_products/product 
{ 
    "brand": "Washington Redskins" 
}

查询

GET demo_products/product/_search 
{ 
    "size": 0, 
    "aggs": { 
    "brand_facet": { 
     "terms": { 
     "field": "brand" 
     } 
    } 
    } 
}

结果

"aggregations": { 
    "brand_facet": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
     { 
      "key": "new york jets", 
      "doc_count": 2 
     }, 
     { 
      "key": "washington redskins", 
      "doc_count": 1 
     } 
     ] 
    } 
    }

如果我们使用的keyword代替text我们结束了2桶，因为在外壳上的差异纽约喷气机队。

我们关注使用fielddata会带来的性能影响。但是，如果fielddata被禁用，我们会得到可怕的“默认情况下，Fielddata在文本字段上处于禁用状态。”

解决此问题的任何其他提示 - 或者我们是否应该不关心fielddate？

来源

2017-01-26 Rasmus

承载ES实例（CPU，内存）的计算机有多大？我们在谈论多少文件？有多少指数？ –

300.000个文档分为28个索引，弹性云托管（3个服务器，目前4 GB） – Rasmus

嗯，为什么这么多索引为数不多的文档？ –

从ES 5.2开始（今天出来），您可以使用normalizers和keyword字段以便（例如）小写该值。

标准化器的作用有点像text字段的分析器，虽然你可以对它们做的事情更加克制，但这可能有助于解决你面临的问题。

你会创建索引这样的：

PUT demo_products 
{ 
    "settings": { 
    "analysis": { 
     "normalizer": { 
     "my_normalizer": { 
      "type": "custom", 
      "filter": [ "lowercase" ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "product": { 
     "properties": { 
     "brand": { 
      "type": "keyword", 
      "normalizer": "my_normalizer" 
     } 
     } 
    } 
    } 
}

和您的查询将返回此：

"aggregations" : { 
    "brand_facet" : { 
     "doc_count_error_upper_bound" : 0, 
     "sum_other_doc_count" : 0, 
     "buckets" : [ 
     { 
      "key" : "new york jets", 
      "doc_count" : 2 
     }, 
     { 
      "key" : "washington redskins", 
      "doc_count" : 1 
     } 
     ] 
    } 
    }

两全其美！

来源

2017-02-01 05:12:20 Val

Elasticsearch fielddata - 我应该使用它吗？

回答

相关问题