2014-05-24 23 views
1

我正在对centos5进行操作,我使用版本1.0.0以-Xms808m -Xmx808m -Xss256k参数运行elasticsearch。有17个索引和总共30200583个文档。每个索引的文档计数在1000000和2000000之间。我创建请求查询(每个索引都有日期字段);Elasticsearch不响应庞大的数据

{ 
    "query": { 
    "bool": { 
     "must": [ 
     { 
      "range": { 
      "date": { 
       "to": "2014-06-01 14:14:00", 
       "from": "2014-04-01 00:00:00" 
      } 
      } 
     } 
     ], 
     "should": [], 
     "must_not": [], 
     "minimum_number_should_match": 1 
    } 
    }, 
    "from": 0, 
    "size": "50" 
} 

它给出了答案;

{ 
    took: 5903 
    timed_out: false 
    _shards: { 
     total: 17 
     successful: 17 
     failed: 0 
    }, 
    hits: { 
    total: 30200583 
... 
... 
...} 

但是,当我发送查询elasticsearch-head工具的最后50行像;

{ 
    ... 
    ... 
    ... 
    "from": 30200533, 
    "size": "50" 
} 

它没有给出响应并抛出异常;

ava.lang.OutOfMemoryError: Java heap space 
     at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:247) 
     at org.apache.lucene.store.Directory.copy(Directory.java:186) 
     at org.elasticsearch.index.store.Store$StoreDirectory.copy(Store.java:348) 
     at org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50) 
     at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4596) 
     at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535) 
     at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502) 
     at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506) 
     at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:616) 
     at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:370) 
     at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:285) 
     at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:260) 
     at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:250) 
     at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) 
     at org.apache.lucene.search.XSearcherManager.refreshIfNeeded(XSearcherManager.java:123) 
     at org.apache.lucene.search.XSearcherManager.refreshIfNeeded(XSearcherManager.java:59) 
     at org.apache.lucene.search.XReferenceManager.doMaybeRefresh(XReferenceManager.java:180) 
     at org.apache.lucene.search.XReferenceManager.maybeRefresh(XReferenceManager.java:229) 
     at org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:730) 
     at org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:477) 
     at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:924) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
     at java.lang.Thread.run(Thread.java:619) 

什么问题?它没有足够的Java堆空间,或者我的查询是否会导致堆空间错误?

回答

2

这两个问题的答案都是“是”。您没有足够的堆空间,这就是您看到错误的原因,并且查询导致了错误,因为您没有足够的堆空间。

的原因是因为排序,分页深是非常昂贵的。要检索第20个元素,您需要将元素1-20保留在内存中并进行排序。要检索第1,000,000个元素,您需要将元素1-999,999保留在内存中并进行排序。

这往往需要大量的内存。

有几个选项:

  • 获取更多的内存。解决问题
  • 使用scan/scroll代替正常搜索。扫描/滚动不执行计分,因此没有排序顺序需要保持,这使得它非常存储器高效
  • 使用不同的排序准则(例如反向排序)或更小的窗口(例如更小的范围内的日期,以便可以页到最后)