2017-09-23 56 views
0

这里是我的ES查询:获取聚集

===创建索引===

PUT /sample 

===插入数据===

PUT /sample/docs/1 
{"data": "And the world said, 'Disarm, disclose, or face serious consequences'—and therefore, we worked with the world, we worked to make sure that Saddam Hussein heard the message of the world."} 
PUT /sample/docs/2 
{"data": "Never give in — never, never, never, never, in nothing great or small, large or petty, never give in except to convictions of honour and good sense. Never yield to force; never yield to the apparently overwhelming might of the enemy"} 

===查询,得到的结果===

POST sample/docs/_search 
{ 
    "query": { 
    "match": { 
     "data": "never" 
    } 
    }, 
    "highlight": { 
    "fields": { 
     "data":{} 
    } 
    } 
} 

===检索结果===

... 
     "highlight": { 
      "data": [ 
      "<em>Never</em> give in — <em>never</em>, <em>never</em>, <em>never</em>, <em>never</em>, in nothing great or small, large or petty, <em>never</em> give", 
      " in except to convictions of honour and good sense. <em>Never</em> yield to force; <em>never</em> yield to the apparently overwhelming might of the enemy" 
      ] 
     } 

===所需的结果===

所需期限由文件 搜索词的频率如下例所示

Doc Id: 2 
Term Frequency :{ 
    "never": 8 
} 

我已经试过桶聚合,术语聚合和其他聚合,但我没有得到这个结果。

感谢您的帮助!

回答

0

您应该使用Term Vector,它支持根据频率查询特定的术语。

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

在这种情况下,您的查询将

GET /sample/docs/_termvectors 
{ 
    "doc": { 
     "data": "never" 
    }, 
    "term_statistics" : true, 
    "field_statistics" : true, 
    "positions": false, 
    "offsets": false, 
    "filter" : { 
     "min_term_freq" : 8 
    } 
} 
+0

我越来越如果我执行你的建议的查询以下错误: '{ “错误”:{ “ROOT_CAUSE”: [ { “type”:“illegal_state_exception”, “reason”:“术语向量请求的字段统计信息存在错误:值为\ nsum_doc_freq 0 \ ndoc_count 0 \ nsum_ttf 0” } ], “类型”: “illegal_state_exception”, “原因”: “出毛病与术语载体请求的字段统计:此数值\ nsum_doc_freq 0 \ ndoc_count 0 \ nsum_ttf 0” }, “状态” :500 }' – Callisto

+0

而我的需求是不同的,根据您的建议查询它将返回结果与术语频率8,但我想要的结果是术语频率的数量。 – Callisto