2013-04-04 74 views
1

我想在我存储在ES中的某些日志上运行分面查询。日志看起来像Elasticsearch:分面查询条件返回意想不到的结果

{"severity": "informational","message_hash_value": "00016B15", "user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1", "host": "192.168.8.225", "version": "1.0", "user": "[email protected]", "created_timestamp": "2013-03-01T15:34:00", "message": "User viewed contents", "inserted_timestamp": "2013-03-01T15:34:00"} 

我试图运行的查询是

curl -XGET 'http://127.0.0.1:9200/logs-*/logs/_search' 
-d {"from":0, "size":0, 
    "facets" : { 
     "user" : { 
      "terms" : {"field" : "user", "size" : 999999 } } } } 

注意,在日志领域"user"是一个电子邮件地址。现在问题是我使用的terms-facet搜索查询返回用户字段的列表,如下所示。

u'facets': {u'user': {u'_type': u'terms', u'total': 2004, u'terms': [{u'count': 1002,u'term': u'test.co'}, {u'count': 320, u'term': u'user_1'}, {u'count': 295,u'term': u'user_2'} 

注意,列表中包含term

{u'count': 1002,u'term': u'test.co'} 

这是域名用户的电子邮件地址。为什么elasticsearch将域名视为一个单独的术语?

运行查询,检查映射

curl -XGET 'http://127.0.0.1:9200/logs-*/_mapping?pretty=true' 

产生的"user"

"user" : { 
     "type" : "string" 
    }, 

回答

2

这种情况下,因为elasticsearch的默认的全局分析仪标记化“@”(除之类的东西空白和标点符号)。通过告知elasticsearch不要在此字段上运行分析器,您可以解决此问题,但是必须重新索引所有数据。

创建新的索引

curl -XPUT 'http://localhost:9200/logs-new' 

在这个新指数的映射指定您不想分析“用户”字段

curl -XPUT 'http://localhost:9200/logs-new/logs/_mapping' -d '{ 
    "logs" : { 
     "properties" : { 
      "user" : { 
       "type" : "string", 
       "index" : "not_analyzed" 
      } 
     } 
    } 
}' 

指数文档

curl -XPOST 'http://localhost:9200/logs-new/logs' -d '{ 
    "created_timestamp": "2013-03-01T15:34:00", 
    "host": "192.168.8.225", 
    "inserted_timestamp": "2013-03-01T15:34:00", 
    "message": "User viewed contents", 
    "message_hash_value": "00016B15", 
    "severity": "informational", 
    "user": "[email protected]", 
    "user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1", 
    "version": "1.0" 
}' 

elasticsearch方面现在将显示整个电子邮件地址

curl -XGET 'http://localhost:9200/logs-new/logs/_search?pretty' -d '{ 
    "from":0, 
    "size":0, 
    "facets" : { 
     "user" : { 
      "terms" : { 
       "field" : "user", 
       "size" : 999999 
      } 
     } 
    } 
}' 

结果:

{ 
    "took" : 1, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 1, 
    "max_score" : 1.0, 
    "hits" : [ ] 
    }, 
    "facets" : { 
    "user" : { 
     "_type" : "terms", 
     "missing" : 0, 
     "total" : 1, 
     "other" : 0, 
     "terms" : [ { 
     "term" : "[email protected]", 
     "count" : 1 
     } ] 
    } 
    } 
} 

参考文献: 核心类型:http://www.elasticsearch.org/guide/reference/mapping/core-types/ 一个新的映射重建索引:https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tCaXgjfUFVU

+0

出色答卷。 +1用于参考和重新索引。干杯 – auny 2013-04-04 19:59:18

相关问题