2015-02-09 19 views
3

请原谅我对ElasticSearch的了解。我有一个Elasticsearch集合,其中包含以下文档:Elasticsearch多个值匹配,没有分析器

{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 2, 
    "dimensions": { 
     "region": "Coimbra District" 

    } 
} 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Federal District"   
    } 
} 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Masovian Voivodeship" 
    } 
} 

这3个json文档在ES服务器中编入索引。我没有提供任何分析器类型(并且不知道如何提供一个:)) 我使用弹簧数据Elasticsearch并执行以下查询来搜索区域'Masovian Voivodeship'或'Federal District'的文档:

{ 
    "query_string" : { 
    "query" : "Masovian Voivodeship OR Federal District", 
    "fields" : [ "dimensions.region" ] 
    } 
} 

我期待它返回2次命中。但是,它会返回所有3个文档(可能是由于第三个文档中有分区)。我如何修改查询,以便它可以执行EXACT匹配并仅提供2个文档?我使用下面的方法:

QueryBuilders.queryString(<OR string>).field("dimensions.region") 

我已经试过QueryBuilders.termsQueryQueryBuilders.inQueryQueryBuilders.matchQuery(带阵列),但没有运气。

任何人都可以请帮忙吗?提前致谢。

+0

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string- query.html尝试将default_operator设置为AND。或者让你的查询“Masovian和Voivodeship或联邦和区” – 2015-02-09 17:54:14

+0

嗨,我试着用查询'{ “query_string”:{ “query”:“Masovian和Voivodeship OR Federal and District”, “fields”:[ dimensions.region“] } }'但它没有返回任何命中。 – 2015-02-09 18:39:49

回答

3

你可以在这里做几件事。

首先,我建立了一个没有任何明确映射或分析的索引,这意味着将使用standard analyzer。这很重要,因为它决定了我们如何根据文本字段进行查询。

于是我开始:

DELETE /test_index 

PUT /test_index 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "number_of_replicas": 0 
    } 
} 

PUT /test_index/doc/1 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 2, 
    "dimensions": { 
     "region": "Coimbra District" 

    } 
} 

PUT /test_index/doc/2 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Federal District"   
    } 
} 

PUT /test_index/doc/3 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Masovian Voivodeship" 
    } 
} 

然后我想你的查询,并没有得到命中。我不明白你为什么在你fields参数有"dimensions.ga:region",但是当我把它改为"dimensions.region"我得到了一些结果:

POST /test_index/doc/_search 
{ 
    "query": { 
     "query_string": { 
     "query": "Masovian Voivodeship OR Federal District", 
     "fields": [ 
      "dimensions.region" 
     ] 
     } 
    } 
} 
... 
{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 3, 
     "max_score": 0.46911472, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "3", 
      "_score": 0.46911472, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Masovian Voivodeship" 
       } 
      } 
     }, 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "2", 
      "_score": 0.3533006, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Federal District" 
       } 
      } 
     }, 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 0.05937162, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 2, 
       "dimensions": { 
        "region": "Coimbra District" 
       } 
      } 
     } 
     ] 
    } 
} 

然而,这将返回你不希望的结果。要解决这个问题的方法之一是如下:

POST /test_index/doc/_search 
{ 
    "query": { 
     "query_string": { 
     "query": "(Masovian AND Voivodeship) OR (Federal AND District)", 
     "fields": [ 
      "dimensions.region" 
     ] 
     } 
    } 
} 
... 
{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.46911472, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "3", 
      "_score": 0.46911472, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Masovian Voivodeship" 
       } 
      } 
     }, 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "2", 
      "_score": 0.3533006, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Federal District" 
       } 
      } 
     } 
     ] 
    } 
} 

另一种方式做到这一点(我喜欢这个更好),这也是同样的结果是使用match queryboolean should组合:

POST /test_index/doc/_search 
{ 
    "query": { 
     "bool": { 
     "should": [ 
      { 
       "match": { 
        "dimensions.region": { 
        "query": "Masovian Voivodeship", 
        "operator": "and" 
        } 
       } 
      }, 
      { 
       "match": { 
        "dimensions.region": { 
        "query": "Federal District", 
        "operator": "and" 
        } 
       } 
      } 
     ] 
     } 
    } 
} 

这里是我使用的代码:

http://sense.qbox.io/gist/bb5062a635c4f9519a411fdd3c8540eae8bdfd51

+1

Hello @Sloan,首先,非常感谢您的详细解答。我试过你的第三种解决方案(因为我也认为这是更好的方法),并像魅力一样工作!我唯一缺少的是'操作员'。我没有指定'operator',因此它在生成查询时采用了默认操作符。默认值是OR,因此它正在搜索带有OR的bu标记,这就是为什么我得到3个结果(甚至在第一次尝试时通过运行相同的查询得到3个结果)。我从查询中删除了'ga'部分,因为它是一个错字。再次,为解决方案而欢呼:) – 2015-02-09 23:49:02

+0

这个例子在意义上是非常棒的! – gonzalon 2015-02-17 22:55:19