Elasticsearch：不能使用Kuromoji阅读表格进行搜索

我使用Elasticsearch 0.90.1和Kuromoji插件1.4.0。Elasticsearch：不能使用Kuromoji阅读表格进行搜索

$ curl localhost:9200 
{ 
    "ok" : true, 
    "status" : 200, 
    "name" : "Agent Zero", 
    "version" : { 
    "number" : "0.90.1", 
    "snapshot_build" : false, 
    "lucene_version" : "4.3" 
    }, 
    "tagline" : "You Know, for Search" 
}

我创建新的索引，使用Kuromoji我default分析：

$ curl -X PUT localhost:9200/test -d '{ 
    "index": { 
    "analysis": { 
     "filter": { 
     "kuromoji_rf": { 
      "type": "kuromoji_readingform", 
      "use_romaji": "false" 
     } 
     }, 
     "tokenizer": { 
     "kuromoji": { 
      "type": "kuromoji_tokenizer" 
     } 
     }, 
     "analyzer": { 
     "default": { 
      "type": "custom", 
      "tokenizer": "kuromoji", 
      "filter": [ 
      "kuromoji_rf" 
      ] 
     } 
     } 
    } 
    } 
}'

结果：

{ 
    "ok": true, 
    "acknowledged": true 
}

的阅读形式令牌过滤器似乎是工作的罚款（汉字归一到片假名）：

$ curl localhost:9200/test/_analyze -d '東京'

结果：

{ 
    "tokens": [ 
    { 
     "token": "トウキョウ", 
     "start_offset": 0, 
     "end_offset": 2, 
     "type": "word", 
     "position": 1 
    } 
    ] 
}

指数的文档：

$ curl -X PUT localhost:9200/test/docs/1 -d '{ 
    "body": "これは関西国際空港です" 
}'

结果：

{ 
    "ok": true, 
    "_index": "test", 
    "_type": "docs", 
    "_id": "1", 
    "_version": 1 
}%

的索引文件相匹配的通配符查询：

$ curl 'localhost:9200/test/docs/_search?q=body:*'

结果：

{ 
    "took": 109, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 1.0, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "docs", 
     "_id": "1", 
     "_score": 1.0, 
     "_source": { 
      "body": "これは関西国際空港です" 
     } 
     } 
    ] 
    } 
}

然而，当我搜索使用日本的不匹配：

$ curl 'localhost:9200/test/docs/_search?q=body:空港'

结果：

{ 
    "took": 21, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
} 

$ curl 'localhost:9200/test/docs/_search?q=body:クウコウ'

结果：

{ 
    "took": 95, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
} 

$ curl 'localhost:9200/test/docs/_search?q=body:空'

结果：

{ 
    "took": 22, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
}

我想如果可能没有被用于搜索查询分析器，但指定分析器没有帮助：

$ curl 'localhost:9200/test/docs/_search?analyzer=default&q=body:空港'

结果：

{ 
    "took": 17, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
}

顺便说一句，一切正常罚款，如果我禁用令牌过滤器。

我在做什么错？

来源

2013-06-26 Chris B

也许你的网址（e.x。localhost:9200/test/docs/_search?q=body:クウコウ）不是URL编码的字符串。
我尝试下面的命令，返回结果。
"クウコウ" -> "%E3%82%AF%E3%82%A6%E3%82%B3%E3%82%A6"

curl 'http://localhost:9200/test/docs/_search?q=body:%E3%82%AF%E3%82%A6%E3%82%B3%E3%82%A6' 
{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 0.11506981, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "docs", 
     "_id": "1", 
     "_score": 0.11506981, 
     "_source": { 
      "body": "これは関西国際空港です" 
     } 
     } 
    ] 
    } 
}

来源

2013-06-26 05:18:18

我是个白痴！谢谢。 –

Elasticsearch：不能使用Kuromoji阅读表格进行搜索

回答

相关问题