2013-06-26 29 views
2

我使用Elasticsearch 0.90.1和Kuromoji插件1.4.0。Elasticsearch:不能使用Kuromoji阅读表格进行搜索

$ curl localhost:9200 
{ 
    "ok" : true, 
    "status" : 200, 
    "name" : "Agent Zero", 
    "version" : { 
    "number" : "0.90.1", 
    "snapshot_build" : false, 
    "lucene_version" : "4.3" 
    }, 
    "tagline" : "You Know, for Search" 
} 

我创建新的索引,使用Kuromoji我default分析:

$ curl -X PUT localhost:9200/test -d '{ 
    "index": { 
    "analysis": { 
     "filter": { 
     "kuromoji_rf": { 
      "type": "kuromoji_readingform", 
      "use_romaji": "false" 
     } 
     }, 
     "tokenizer": { 
     "kuromoji": { 
      "type": "kuromoji_tokenizer" 
     } 
     }, 
     "analyzer": { 
     "default": { 
      "type": "custom", 
      "tokenizer": "kuromoji", 
      "filter": [ 
      "kuromoji_rf" 
      ] 
     } 
     } 
    } 
    } 
}' 

结果:

{ 
    "ok": true, 
    "acknowledged": true 
} 

的阅读形式令牌过滤器似乎是工作的罚款(汉字归一到片假名):

$ curl localhost:9200/test/_analyze -d '東京' 

结果:

{ 
    "tokens": [ 
    { 
     "token": "トウキョウ", 
     "start_offset": 0, 
     "end_offset": 2, 
     "type": "word", 
     "position": 1 
    } 
    ] 
} 

指数的文档:

$ curl -X PUT localhost:9200/test/docs/1 -d '{ 
    "body": "これは関西国際空港です" 
}' 

结果:

{ 
    "ok": true, 
    "_index": "test", 
    "_type": "docs", 
    "_id": "1", 
    "_version": 1 
}% 

的索引文件相匹配的通配符查询:

$ curl 'localhost:9200/test/docs/_search?q=body:*' 

结果:

{ 
    "took": 109, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 1.0, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "docs", 
     "_id": "1", 
     "_score": 1.0, 
     "_source": { 
      "body": "これは関西国際空港です" 
     } 
     } 
    ] 
    } 
} 

然而,当我搜索使用日本的不匹配:

$ curl 'localhost:9200/test/docs/_search?q=body:空港' 

结果:

{ 
    "took": 21, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
} 

$ curl 'localhost:9200/test/docs/_search?q=body:クウコウ' 

结果:

{ 
    "took": 95, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
} 

$ curl 'localhost:9200/test/docs/_search?q=body:空' 

结果:

{ 
    "took": 22, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
} 

我想如果可能没有被用于搜索查询分析器,但指定分析器没有帮助:

$ curl 'localhost:9200/test/docs/_search?analyzer=default&q=body:空港' 

结果:

{ 
    "took": 17, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
} 

顺便说一句,一切正常罚款,如果我禁用令牌过滤器。

我在做什么错?

回答

4

也许你的网址(e.x。localhost:9200/test/docs/_search?q=body:クウコウ)不是URL编码的字符串。
我尝试下面的命令,返回结果。
"クウコウ" -> "%E3%82%AF%E3%82%A6%E3%82%B3%E3%82%A6"

curl 'http://localhost:9200/test/docs/_search?q=body:%E3%82%AF%E3%82%A6%E3%82%B3%E3%82%A6' 
{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 0.11506981, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "docs", 
     "_id": "1", 
     "_score": 0.11506981, 
     "_source": { 
      "body": "これは関西国際空港です" 
     } 
     } 
    ] 
    } 
} 
+0

我是个白痴!谢谢。 –