2015-12-08 73 views
0

我在ES中使用建议API完成。我的实现工作(代码如下),但我想在查询中搜索多个单词。在下面的例子中,如果我查询搜索“word”,它会找到“wordpress”并输出“Found”。我试图完成的是用“词博客杂志”这样的东西来查询所有标签,并且具有“找到”的输出。任何帮助,将不胜感激!ElasticSearch:建议完成多搜索

映射:

curl -XPUT "http://localhost:9200/test_index/" -d' 
    { 
    "mappings": { 
     "product": { 
     "properties": { 
      "description": { 
       "type": "string" 
      }, 
      "tags": { 
       "type": "string" 
      }, 
      "title": { 
       "type": "string" 
      }, 
      "tag_suggest": { 
       "type": "completion", 
       "index_analyzer": "simple", 
       "search_analyzer": "simple", 
       "payloads": false 
      } 
     } 
     } 
    } 
}' 

添加文档:

curl -XPUT "http://localhost:9200/test_index/product/1" -d' 
    { 
    "title": "Product1", 
    "description": "Product1 Description", 
    "tags": [ 
     "blog", 
     "magazine", 
     "responsive", 
     "two columns", 
     "wordpress" 
    ], 
    "tag_suggest": { 
     "input": [ 
     "blog", 
     "magazine", 
     "responsive", 
     "two columns", 
     "wordpress" 
     ], 
     "output": "Found" 
    } 
}' 

_suggest查询:

curl -XPOST "http://localhost:9200/test_index/_suggest" -d' 
    { 
    "product_suggest":{ 
     "text":"word", 
     "completion": { 
      "field" : "tag_suggest" 
     } 
    } 
}' 
The results are as we would expect: 
    { 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "product_suggest": [ 
     { 
     "text": "word", 
     "offset": 0, 
     "length": 4, 
     "options": [ 
      { 
      "text": "Found", 
      "score": 1 
     }, 
     ] 
     } 
    ] 
} 
+0

您是否愿意使用ngram解决方案而不是完成建议? –

+0

我实际上以前有模糊实现的边缘语法,但是我的分数都搞砸了,并建议使用建议api来更快地查询大量数据。这两者之间你有什么看法?对我来说一个关键的要求是用空格分隔多个搜索 – emarel

+0

使用ngram解决方案最后一部分很容易。虽然不确定评分。我不确定是否要完成多项任务。我得看看它。我假设你想要一个OR搜索,而不是,对吗? –

回答

0

如果你愿意改用edge ngrams(或完整的n-gram,如果您需要他们),我认为它会解决你的问题。

我写了如何做到这一点,在这个博客后一个相当详细的解释:

https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

但我会在这里给你一个快速和肮脏的版本。诀窍是将ngram与_all fieldmatch AND operator一起使用。

所以用这个映射:

PUT /test_index 
{ 
    "settings": { 
     "analysis": { 
     "filter": { 
      "ngram_filter": { 
       "type": "edge_ngram", 
       "min_gram": 2, 
       "max_gram": 20 
      } 
     }, 
     "analyzer": { 
      "ngram_analyzer": { 
       "type": "custom", 
       "tokenizer": "standard", 
       "filter": [ 
        "lowercase", 
        "ngram_filter" 
       ] 
      } 
     } 
     } 
    }, 
    "mappings": { 
     "doc": { 
     "_all": { 
      "type": "string", 
      "analyzer": "ngram_analyzer", 
      "search_analyzer": "standard" 
     }, 
     "properties": { 
      "word": { 
       "type": "string", 
       "include_in_all": true 
      }, 
      "definition": { 
       "type": "string", 
       "include_in_all": true 
      } 
     } 
     } 
    } 
} 

和一些文件:

PUT /test_index/_bulk 
{"index":{"_index":"test_index","_type":"doc","_id":1}} 
{"word":"democracy", "definition":"government by the people; a form of government in which the supreme power is vested in the people and exercised directly by them or by their elected agents under a free electoral system."} 
{"index":{"_index":"test_index","_type":"doc","_id":2}} 
{"word":"republic", "definition":"a state in which the supreme power rests in the body of citizens entitled to vote and is exercised by representatives chosen directly or indirectly by them."} 
{"index":{"_index":"test_index","_type":"doc","_id":3}} 
{"word":"oligarchy", "definition":"a form of government in which all power is vested in a few persons or in a dominant class or clique; government by the few."} 
{"index":{"_index":"test_index","_type":"doc","_id":4}} 
{"word":"plutocracy", "definition":"the rule or power of wealth or of the wealthy."} 
{"index":{"_index":"test_index","_type":"doc","_id":5}} 
{"word":"theocracy", "definition":"a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities."} 
{"index":{"_index":"test_index","_type":"doc","_id":6}} 
{"word":"monarchy", "definition":"a state or nation in which the supreme power is actually or nominally lodged in a monarch."} 
{"index":{"_index":"test_index","_type":"doc","_id":7}} 
{"word":"capitalism", "definition":"an economic system in which investment in and ownership of the means of production, distribution, and exchange of wealth is made and maintained chiefly by private individuals or corporations, especially as contrasted to cooperatively or state-owned means of wealth."} 
{"index":{"_index":"test_index","_type":"doc","_id":8}} 
{"word":"socialism", "definition":"a theory or system of social organization that advocates the vesting of the ownership and control of the means of production and distribution, of capital, land, etc., in the community as a whole."} 
{"index":{"_index":"test_index","_type":"doc","_id":9}} 
{"word":"communism", "definition":"a theory or system of social organization based on the holding of all property in common, actual ownership being ascribed to the community as a whole or to the state."} 
{"index":{"_index":"test_index","_type":"doc","_id":10}} 
{"word":"feudalism", "definition":"the feudal system, or its principles and practices."} 
{"index":{"_index":"test_index","_type":"doc","_id":11}} 
{"word":"monopoly", "definition":"exclusive control of a commodity or service in a particular market, or a control that makes possible the manipulation of prices."} 
{"index":{"_index":"test_index","_type":"doc","_id":12}} 
{"word":"oligopoly", "definition":"the market condition that exists when there are few sellers, as a result of which they can greatly influence price and other market factors."} 

我可以在这两个领域的应用部分匹配(将与许多领域的工作,只要你想)是这样的:

POST /test_index/_search 
{ 
    "query": { 
     "match": { 
      "_all": { 
       "query": "theo go", 
       "operator": "and" 
      } 
     } 
    } 
} 

在这种情况下返回:

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 0.7601639, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "5", 
      "_score": 0.7601639, 
      "_source": { 
       "word": "theocracy", 
       "definition": "a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities." 
      } 
     } 
     ] 
    } 
} 

这是我在这里使用的代码(还有更多的博客文章):

http://sense.qbox.io/gist/e4093c25a8257499f54ced5a09f35b1eb48e4e3c

希望有所帮助。

+0

谢谢,我实际上已经检查过你的博客,我认为它太棒了!在你看来,对于这种情况,你为什么会倾向于n-gram路线,然后使用建议api?当你使用n-gram和模糊性评分变得怪怪的时候,你有没有看过? – emarel

+0

我喜欢ngrams,因为你不需要冗余数据。在一个可能变得重要的大数据集中。评分绝对是一个问题。我的感觉是有一种解决方法,但我不知道如何去做。 –

+0

谢谢,你为什么要做:“分析仪”:“ngram_analyzer”“search_analyzer”:“标准”而不是“分析仪”:“ngram_analyzer”? – emarel

相关问题