2013-05-29 47 views
1

我是Elastic Search的新手。我在弹性搜索索引中将字段映射为“字符串”。如果字段值包含给定的搜索文本,我需要检索文档。如果源包含弹性搜索服务器中的给定搜索文本,则获取所有文档

JSON1 : "{\"id\":\"1\",\"message\":\"Welcome to elastic search\"}" 
JSON2 : "{\"id\":\"2\",\"message\":\"elasticsearch\"}" 

如果我用“弹性”搜索,我需要得到双方的记录。我只获得第一个。

现在我正在基于FTS获取文档。请指导我在Elastic Search中的psql中实现像/ ilike这样的搜索。

在此先感谢。

回答

1

这是一个标记器的问题。您可以在NGRAM http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenizer/

看看您可以使用路由/_analyze

这里测试它是如何Elasticsearch默认记号化。

curl -XGET 'localhost:9200/_analyze?tokenizer=standard' -d 'this is a test elasticsearch'

{ 
"tokens": [{ 
     "token": "this", 
     "start_offset": 0, 
     "end_offset": 4, 
     "type": "<ALPHANUM>", 
     "position": 1 
    }, { 
     "token": "is", 
     "start_offset": 5, 
     "end_offset": 7, 
     "type": "<ALPHANUM>", 
     "position": 2 
    }, { 
     "token": "a", 
     "start_offset": 8, 
     "end_offset": 9, 
     "type": "<ALPHANUM>", 
     "position": 3 
    }, { 
     "token": "test", 
     "start_offset": 10, 
     "end_offset": 14, 
     "type": "<ALPHANUM>", 
     "position": 4 
    }, { 
     "token": "elasticsearch", 
     "start_offset": 15, 
     "end_offset": 28, 
     "type": "<ALPHANUM>", 
     "position": 5 
    } 
] 

}

下面是NGRAM一个例子,缺省值

curl -XGET 'localhost:9200/_analyze?tokenizer=nGram' -d 'this is a test elasticsearch'

{ 
    "tokens": [{ 
      "token": "t", 
      "start_offset": 0, 
      "end_offset": 1, 
      "type": "word", 
      "position": 1 
     }, { 
      "token": "h", 
      "start_offset": 1, 
      "end_offset": 2, 
      "type": "word", 
      "position": 2 
     }, { 
      "token": "i", 
      "start_offset": 2, 
      "end_offset": 3, 
      "type": "word", 
      "position": 3 
     }, { 
      "token": "s", 
      "start_offset": 3, 
      "end_offset": 4, 
      "type": "word", 
      "position": 4 
     }, { 
      "token": " ", 
      "start_offset": 4, 
      "end_offset": 5, 
      "type": "word", 
      "position": 5 
     }, { 
      "token": "i", 
      "start_offset": 5, 
      "end_offset": 6, 
      "type": "word", 
      "position": 6 
     }, { 
      "token": "s", 
      "start_offset": 6, 
      "end_offset": 7, 
      "type": "word", 
      "position": 7 
     }, { 
      "token": " ", 
      "start_offset": 7, 
      "end_offset": 8, 
      "type": "word", 
      "position": 8 
     }, { 
      "token": "a", 
      "start_offset": 8, 
      "end_offset": 9, 
      "type": "word", 
      "position": 9 
     }, { 
      "token": " ", 
      "start_offset": 9, 
      "end_offset": 10, 
      "type": "word", 
      "position": 10 
     }, { 
      "token": "t", 
      "start_offset": 10, 
      "end_offset": 11, 
      "type": "word", 
      "position": 11 
     }, { 
      "token": "e", 
      "start_offset": 11, 
      "end_offset": 12, 
      "type": "word", 
      "position": 12 
     }, { 
      "token": "s", 
      "start_offset": 12, 
      "end_offset": 13, 
      "type": "word", 
      "position": 13 
     }, { 
      "token": "t", 
      "start_offset": 13, 
      "end_offset": 14, 
      "type": "word", 
      "position": 14 
     }, { 
      "token": " ", 
      "start_offset": 14, 
      "end_offset": 15, 
      "type": "word", 
      "position": 15 
     }, { 
      "token": "e", 
      "start_offset": 15, 
      "end_offset": 16, 
      "type": "word", 
      "position": 16 
     }, { 
      "token": "l", 
      "start_offset": 16, 
      "end_offset": 17, 
      "type": "word", 
      "position": 17 
     }, { 
      "token": "a", 
      "start_offset": 17, 
      "end_offset": 18, 
      "type": "word", 
      "position": 18 
     }, { 
      "token": "s", 
      "start_offset": 18, 
      "end_offset": 19, 
      "type": "word", 
      "position": 19 
     }, { 
      "token": "t", 
      "start_offset": 19, 
      "end_offset": 20, 
      "type": "word", 
      "position": 20 
     }, { 
      "token": "i", 
      "start_offset": 20, 
      "end_offset": 21, 
      "type": "word", 
      "position": 21 
     }, { 
      "token": "c", 
      "start_offset": 21, 
      "end_offset": 22, 
      "type": "word", 
      "position": 22 
     }, { 
      "token": "s", 
      "start_offset": 22, 
      "end_offset": 23, 
      "type": "word", 
      "position": 23 
     }, { 
      "token": "e", 
      "start_offset": 23, 
      "end_offset": 24, 
      "type": "word", 
      "position": 24 
     }, { 
      "token": "a", 
      "start_offset": 24, 
      "end_offset": 25, 
      "type": "word", 
      "position": 25 
     }, { 
      "token": "r", 
      "start_offset": 25, 
      "end_offset": 26, 
      "type": "word", 
      "position": 26 
     }, { 
      "token": "c", 
      "start_offset": 26, 
      "end_offset": 27, 
      "type": "word", 
      "position": 27 
     }, { 
      "token": "h", 
      "start_offset": 27, 
      "end_offset": 28, 
      "type": "word", 
      "position": 28 
     }, { 
      "token": "th", 
      "start_offset": 0, 
      "end_offset": 2, 
      "type": "word", 
      "position": 29 
     }, { 
      "token": "hi", 
      "start_offset": 1, 
      "end_offset": 3, 
      "type": "word", 
      "position": 30 
     }, { 
      "token": "is", 
      "start_offset": 2, 
      "end_offset": 4, 
      "type": "word", 
      "position": 31 
     }, { 
      "token": "s ", 
      "start_offset": 3, 
      "end_offset": 5, 
      "type": "word", 
      "position": 32 
     }, { 
      "token": " i", 
      "start_offset": 4, 
      "end_offset": 6, 
      "type": "word", 
      "position": 33 
     }, { 
      "token": "is", 
      "start_offset": 5, 
      "end_offset": 7, 
      "type": "word", 
      "position": 34 
     }, { 
      "token": "s ", 
      "start_offset": 6, 
      "end_offset": 8, 
      "type": "word", 
      "position": 35 
     }, { 
      "token": " a", 
      "start_offset": 7, 
      "end_offset": 9, 
      "type": "word", 
      "position": 36 
     }, { 
      "token": "a ", 
      "start_offset": 8, 
      "end_offset": 10, 
      "type": "word", 
      "position": 37 
     }, { 
      "token": " t", 
      "start_offset": 9, 
      "end_offset": 11, 
      "type": "word", 
      "position": 38 
     }, { 
      "token": "te", 
      "start_offset": 10, 
      "end_offset": 12, 
      "type": "word", 
      "position": 39 
     }, { 
      "token": "es", 
      "start_offset": 11, 
      "end_offset": 13, 
      "type": "word", 
      "position": 40 
     }, { 
      "token": "st", 
      "start_offset": 12, 
      "end_offset": 14, 
      "type": "word", 
      "position": 41 
     }, { 
      "token": "t ", 
      "start_offset": 13, 
      "end_offset": 15, 
      "type": "word", 
      "position": 42 
     }, { 
      "token": " e", 
      "start_offset": 14, 
      "end_offset": 16, 
      "type": "word", 
      "position": 43 
     }, { 
      "token": "el", 
      "start_offset": 15, 
      "end_offset": 17, 
      "type": "word", 
      "position": 44 
     }, { 
      "token": "la", 
      "start_offset": 16, 
      "end_offset": 18, 
      "type": "word", 
      "position": 45 
     }, { 
      "token": "as", 
      "start_offset": 17, 
      "end_offset": 19, 
      "type": "word", 
      "position": 46 
     }, { 
      "token": "st", 
      "start_offset": 18, 
      "end_offset": 20, 
      "type": "word", 
      "position": 47 
     }, { 
      "token": "ti", 
      "start_offset": 19, 
      "end_offset": 21, 
      "type": "word", 
      "position": 48 
     }, { 
      "token": "ic", 
      "start_offset": 20, 
      "end_offset": 22, 
      "type": "word", 
      "position": 49 
     }, { 
      "token": "cs", 
      "start_offset": 21, 
      "end_offset": 23, 
      "type": "word", 
      "position": 50 
     }, { 
      "token": "se", 
      "start_offset": 22, 
      "end_offset": 24, 
      "type": "word", 
      "position": 51 
     }, { 
      "token": "ea", 
      "start_offset": 23, 
      "end_offset": 25, 
      "type": "word", 
      "position": 52 
     }, { 
      "token": "ar", 
      "start_offset": 24, 
      "end_offset": 26, 
      "type": "word", 
      "position": 53 
     }, { 
      "token": "rc", 
      "start_offset": 25, 
      "end_offset": 27, 
      "type": "word", 
      "position": 54 
     }, { 
      "token": "ch", 
      "start_offset": 26, 
      "end_offset": 28, 
      "type": "word", 
      "position": 55 
     } 
    ] 
} 

下面是一个例子来设置适当的链路分析器/分词器在您的索引 How to setup a tokenizer in elasticsearch

然后您的查询应返回预期的文档。

相关问题