Elasticsearch查询性能

我使用elasticsearch索引两种类型的对象 -Elasticsearch查询性能

数据细节

合同对象〜60点的属性（对象大小 - 120个字节）风险Item对象〜125点的属性（对象大小 - 250个字节）

合同是

我存储2.4亿单指标2.1亿风险的项目，对象（风险项目（_parent）的母公司， 30000000个合同）

指数大小为 - 322 GB

群集细节

11 m2.4x.large EC2盒[68 GB存储器，1.6 TB存储，8个内核（1盒是与node.data负载平衡器节点=假） 50碎片 1复制品

elasticsearch.yml

node.data: true 
http.enabled: false 
index.number_of_shards: 50 
index.number_of_replicas: 1 
index.translog.flush_threshold_ops: 10000 
index.merge.policy.use_compound_files: false 
indices.memory.index_buffer_size: 30% 
index.refresh_interval: 30s 
index.store.type: mmapfs 
path.data: /data-xvdf,/data-xvdg

我开始了elasticsearch节点用下面的命令 - /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

我的问题是，我运行以下关于风险项目类型的查询，并且返回数据需要大约10-15秒，对于20条记录。

我使用50个并发用户的负载和大约5000个风险项目的批量索引负载并行运行。

查询（随着加入父子）

的http：//：9200/contractindex/riskitem/_search *

{ 
    "query": { 
     "has_parent": { 
      "parent_type": "contract", 
      "query": { 
       "range": { 
        "ContractDate": { 
         "gte": "2010-01-01" 
        } 
       } 
      } 
     } 
    }, 
    "filter": { 
     "and": [{ 
      "query": { 
       "bool": { 
        "must": [{ 
         "query_string": { 
          "fields": ["RiskItemProperty1"], 
          "query": "abc" 
         } 
        }, 
        { 
         "query_string": { 
          "fields": ["RiskItemProperty2"], 
          "query": "xyz" 
         } 
        }] 
       } 
      } 
     }] 
    } 
}

查询从一个表

查询1（这查询大约需要8秒。）

<!-- language: lang-json --> 

    { 
     "query": { 
      "constant_score": { 
       "filter": { 
        "and": [{ 
         "term": { 
          "CommonCharacteristic_BuildingScheme": "BuildingScheme1" 
         } 
        }, 
        { 
         "term": { 
          "Address_Admin2Name": "Admin2Name1" 
         } 
        }] 
       } 
      } 
     } 
    } 



**Query2** (This query takes around 6.5 seconds for Top 10 records (but has sort on top of it) 

<!-- language: lang-json --> 

    { 
     "query": { 
      "constant_score": { 
       "filter": { 
        "and": [{ 
         "term": { 
          "Insurer": "Insurer1" 
         } 
        }, 
        { 
         "term": { 
          "Status": "Status1" 
         } 
        }] 
       } 
      } 
     } 
    }

有人可以帮助我如何提高此查询性能？

来源

2013-08-16 Vishal

我对答案也很感兴趣。您是否尝试过其他类型的文档之间的关系？我指的是嵌套对象。我可能是错的，但我会说父子关系是一种“查询连接”。嵌套对象位于相同的Lucene块中，因此搜索查询可能会更快。 – jackdbernier

我也有一个问题......为什么'Xms30g -Xmx30g'而不是更多？ – jackdbernier

对象非常大，嵌套的对象需要很大的空间。 – Vishal

您是否尝试过自定义路由？如果没有自定义路由，您的查询需要查看所有50个分片以满足您的请求。通过自定义路由，您的查询知道要搜索哪些分片，使查询更具性能。更多here。

如bulk api docs中所述，您可以通过在_routing字段中包含路由值来为每个批量项目分配自定义路由。

来源

2013-08-16 13:58:12

除定制路由之外，还有其他什么选项？ – Vishal

正如jackdbernier在他的评论中提到的那样，增加堆大小将有助于提升性能。这[线索]（http://elasticsearch-users.115913.n3.nabble.com/Slow-Query-Performance-td4024165.html）现在已经快一年了，但它的信息可能还是不错的。例如，Elasticsearch团队在此建议将heap_size设置为整个内存的60％。所以，在你的情况下，尝试增加你的堆到40克。 –

刚刚尝试了41gb的堆大小，结果仍然相同。 – Vishal

我们通过使用位集进行了更改。

我们跑了50个并发用户（只读）一个小时。我们所有的查询速度都提高了4到5倍，除了父子查询（问题查询），它已经从7秒降到3秒。

我还有一个has_child查询。任何人有任何其他反馈，我们可以进一步改善这一点，或其他查询。

{ 
    "query": { 
     "filtered": { 
      "query": { 
       "bool": { 
        "must": [{ 
         "match": { 
          "LineOfBusiness": "LOBValue1" 
         } 
        }] 
       } 
      }, 
      "filter": { 
       "has_child": { 
        "type": "riskitem", 
        "filter": { 
         "bool": { 
          "must": [{ 
           "term": { 
            "Address_Admin1Name": "Admin1Name1" 
           } 
          }] 
         } 
        } 
       } 
      } 
     } 
    } 
}

来源

2013-08-20 17:27:18 Vishal

任何人都可以请评论/帮助？ – Vishal

基本上用BOOL替换你的AND/OR过滤器。利用位集。不知道为什么，但只是做它，看看它是否更快。 –

Elasticsearch查询性能

回答

相关问题