2013-02-25 39 views
2

我试图在berkeleydb-JE中插入〜56,249,000项。我跑DbCacheSize,以获取有关我的数据库中的一些统计数据:向BerkeleyDB-JE插入数据越来越慢

java -jar je-5.0.34.jar DbCacheSize -records 56248699 -key 8 -data 20 

=== Environment Cache Overhead === 

3,155,957 minimum bytes 

To account for JE daemon operation and record locks, 
a significantly larger amount is needed in practice. 

=== Database Cache Size === 

Minimum Bytes Maximum Bytes Description 
--------------- --------------- ----------- 
    1,287,110,736 1,614,375,504 Internal nodes only 
    4,330,861,264 4,658,126,032 Internal nodes and leaf nodes 

=== Internal Node Usage by Btree Level === 

Minimum Bytes Maximum Bytes  Nodes Level 
--------------- --------------- ---------- ----- 
    1,269,072,064 1,592,660,160  632,008 1 
    17,837,712  21,473,424  7,101 2 
     198,448   238,896   79 3 
      2,512   3,024   1 4 

我2年前Optimizing a BerkeleyDB JE Database问这个问题,但我仍然不知道我应该如何从这些统计数据配置我的环境?

当数据将被加载时,我将是唯一有权访问数据库的用户:我应该使用事务吗?

我ENV是当前打开如下:

EnvironmentConfig cfg=(...) 
cfg.setTransactional(true); 
cfg.setAllowCreate(true); 
cfg.setReadOnly(false); 
cfg.setCachePercent(80); 
cfg.setConfigParam(EnvironmentConfig.LOG_FILE_MAX,"250000000"); 

数据库:

cfg.setAllowCreate(true); 
cfg.setTransactional(true); 
cfg.setReadOnly(false); 

和我读/插入项目的下列方式:

Transaction txn= env.beginTransaction(null, null); 
//open db with transaction 'txn' 
Database db=env.open(...txn) 

Transaction txn2=this.getEnvironment().beginTransaction(null, null); 
long record_id=0L; 
while((item=readNextItem(input))!=null) 
    { 
    (...) 
    ++record_id; 

    db.put(...); //insert record_id/item into db 
    /** every 100000 items commit and create a new transaction. 
     I found it was the only way to avoid an outOfMemory exception */ 
    if(record_id%100000==0) 
     { 
     txn2.commit(); 
     System.gc(); 
     txn2=this.getEnvironment().beginTransaction(null, null); 
     } 
    } 

txn2.commit(); 
txn.commit(); 

但事情越来越越来越慢。我从eclipse运行程序,没有为JVM设置任何东西。

100000/56248699 (0.2 %). 13694.9 records/seconds. Time remaining:68.3 m Disk Usage: 23.4 Mb. Expect Disk Usage: 12.8 Gb Free Memory : 318.5 Mb. 
200000/56248699 (0.4 %). 16680.6 records/seconds. Time remaining:56.0 m Disk Usage: 49.5 Mb. Expect Disk Usage: 13.6 Gb Free Memory : 338.3 Mb. 
(...) 
6600000/56248699 (11.7 %). 9658.2 records/seconds. Time remaining:85.7 m Disk Usage: 2.9 Gb. Expect Disk Usage: 24.6 Gb Free Memory : 165.0 Mb. 
6700000/56248699 (11.9 %). 9474.5 records/seconds. Time remaining:87.2 m Disk Usage: 2.9 Gb. Expect Disk Usage: 24.7 Gb Free Memory : 164.8 Mb. 
6800000/56248699 (12.1 %). 9322.6 records/seconds. Time remaining:88.4 m Disk Usage: 3.0 Gb. Expect Disk Usage: 24.8 Gb Free Memory : 164.8 Mb. 
(Ctrl-C... abort...) 

我该如何让事情变得更快?

更新:

MemTotal:  4021708 kB 
MemFree:   253580 kB 
Buffers:   89360 kB 
Cached:   1389272 kB 
SwapCached:   56 kB 
Active:   2228712 kB 
Inactive:  1449096 kB 
Active(anon): 1793592 kB 
Inactive(anon): 596852 kB 
Active(file):  435120 kB 
Inactive(file): 852244 kB 
Unevictable:   0 kB 
Mlocked:    0 kB 
HighTotal:  3174028 kB 
HighFree:   57412 kB 
LowTotal:   847680 kB 
LowFree:   196168 kB 
SwapTotal:  4085756 kB 
SwapFree:  4068224 kB 
Dirty:    16320 kB 
Writeback:    0 kB 
AnonPages:  2199056 kB 
Mapped:   111280 kB 
Shmem:   191272 kB 
Slab:    58664 kB 
SReclaimable:  41448 kB 
SUnreclaim:  17216 kB 
KernelStack:  3792 kB 
PageTables:  11328 kB 
NFS_Unstable:   0 kB 
Bounce:    0 kB 
WritebackTmp:   0 kB 
CommitLimit:  6096608 kB 
Committed_AS: 5069728 kB 
VmallocTotal:  122880 kB 
VmallocUsed:  18476 kB 
VmallocChunk:  81572 kB 
HardwareCorrupted:  0 kB 
AnonHugePages:   0 kB 
HugePages_Total:  0 
HugePages_Free:  0 
HugePages_Rsvd:  0 
HugePages_Surp:  0 
Hugepagesize:  2048 kB 
DirectMap4k:  10232 kB 
DirectMap2M:  903168 kB 

更新2:

Max. Heap Size (Estimated): 872.94M 
Ergonomics Machine Class: server 
Using VM: Java HotSpot(TM) Server VM 

更新3:

使用Jerven的建议,我得到以下性能:

(...) 
    6800000/56248699 (12.1 %). 13144.8 records/seconds. Time remaining:62.7 m Disk Usage: 1.8 Gb. Expect Disk Usage: 14.6 Gb Free Memory : 95.5 Mb. 
    (...) 

VS我以前结果:

6800000/56248699 (12.1 %). 9322.6 records/seconds. Time remaining:88.4 m Disk Usage: 3.0 Gb. Expect Disk Usage: 24.8 Gb Free Memory : 164.8 Mb. 
+0

你可以添加你的JVM和机器细节。 – Jerven 2013-02-25 13:16:31

+0

Java(TM)SE运行环境(build 1.7.0_07-b10) – Pierre 2013-02-25 13:44:10

+0

Linux名称3.2.0-38-generic-pae#60-Ubuntu SMP Wed Feb 13 13:47:26 UTC 2013 i686 i686 i386 GNU/Linux – Pierre 2013-02-25 13:44:29

回答

2

首先,我将删除您对System.gc()的显式调用; 如果您注意到这种辅助性能考虑使用不同的GC算法。例如,当bdb/je缓存使用率一直接近可用堆的70%时,G1GC将表现更好。其次在某些时候,B +索引更新将会是n log n性能,并会减少插入时间。

不使用交易会更快。特别是,如果您可以从头开始重新导入,如果它失败。

只记得在最后做一个environment.sync()和一个检查点。在执行此导入时,您可能想要禁用BDB/je检查点和BDB/je GC线程。

config.setConfigParam(EnvironmentConfig.ENV_RUN_CLEANER, "false"); 
config.setConfigParam(EnvironmentConfig.ENV_RUN_CHECKPOINTER, "false); 
config.setConfigParam(EnvironmentConfig.ENV_RUN_IN_COMPRESSOR, "false"); 

加载后,你应该调用这样的方法。

public void checkpointAndSync() 
    throws ObjectStoreException 
{ 
      env.sync(); 
    CheckpointConfig force = new CheckpointConfig(); 
    force.setForce(true); 
    try 
    { 
     env.checkpoint(force); 
    } catch (DatabaseException e) 
    { 
     log.error("Can not chekpoint db " + path.getAbsolutePath(), e); 
     throw new ObjectStoreException(e); 
    } 
} 

你可能会考虑,并打开keyprefixing以及。

其余的内部节点缓存大小应该至少为1.6 GB,这意味着大于2GB的堆开始。

您也可以考虑合并记录。例如,如果您的密钥自然递增,您可以在一个密钥下存储16个值。但如果你认为这是一个有趣的方法,你可能会从increasing the B tree fanout setting开始。

+0

这是我接受并投票的答案。做得很好。 – duffymo 2013-02-25 13:26:58