2017-12-27 251 views
0

Spark正在运行:Neo4j:工作人员会话...崩溃。 Java堆空间OutOfMemoryError

这是一个非常简单的程序,它已经从java和'parallelized'转换为scala(它不打算并行运行,而是一个实验)学习spark和neo4j,b)看看我是否可以通过在更多节点上做更多工作的火花集群上运行来获得一些速度增益)。原因是大瓶颈是neo4j密码脚本中的空间调用(一个withinDistance调用)。测试数据集非常小,52,000个节点和大小为140 mb的数据库。

此外,当Neo4j的启动时,它给我的

Starting Neo4j. 
WARNING: Max 4096 open files allowed, minimum of 40000 recommended. See the Neo4j manual. 
/usr/share/neo4j/bin/neo4j: line 411: /var/run/neo4j/neo4j.pid: No such file or directory 

这是奇怪的警告,因为我相信,这是打开的文件,我问的系统管理员设置,为一路走高? (ulimit -Hn似乎证实了这一点?说90,000虽然ulimit -a显示打开文件在4096(softlimit)我想这就是neo4j看到和呜呜)

此外,当我在我的Mac OS上本地运行X.软件会运行并执行大约14个小时左右(也许9次),然后我会在控制台中看到数据库会停止与火花交谈。它没有下降,或者任何类似的工作都会超时,我仍然可以在数据库中使用密码shell。但它会以某种方式失去与spark工作的联系,所以他们会尝试,最后火花提交会放弃并停止。

C02RH2U9G8WM:scala-2.11 little.mac$ ulimit -Hn 
unlimited 

(也从去年的编辑我甚至调升我的极限更在Neo4j的CONF,现在最大4GB内存堆大小)

从作业中的某些码位(使用移植的代码到斯卡拉与增加了火花数据框,我知道它并不合适,但希望在推进之前能有所作为)。我正在构建一个混合程序,就像我移植的java代码,但使用spark的数据框(连接到neo4j)。

本质(伪代码):

while (going through all these lat and lons) 
{ 
    def DoCalculation() 
    { 

     val noBbox="call spatial.bbox('geom', {lat:" + minLat +",lon:"+minLon +"}, {lat:"+maxLat+",lon:" + maxLon +"}) yield node return node.altitude as altitude, node.gtype as gtype, node.toDateFormatLong as toDateFormatLong, node.latitude as latitude, node.longitude as longitude, node.fromDateFormatLong as fromDateFormatLong, node.fromDate as fromDate, node.toDate as toDate ORDER BY node.toDateFormatLong DESC";   
     try { 
        //not overly sure what the partitions and batch are really doing for me. 
        val initialDf2 = neo.cypher(noBbox).partitions(5).batch(10000).loadDataFrame 

        val theRow = initialDf2.collect() //was someStr 

        for(i <- 0 until theRow.length){ 
          //do more calculations 

         var radius2= 100 
         //this call is where the biggest bottle neck is,t he spatial withinDistance is where i thought 
         //I could put this code ons park and make the calls through data frames and do the same long work 
         //but by batching it out to many nodes would get more speed gains? 

         val pointQuery="call spatial.withinDistance('geom', {lat:" + lat + ",lon:"+ lon +"}, " + radius2 + ") yield node, distance WITH node, distance match (node:POINT) WHERE node.toDateFormatLong < " + toDateFormatLong + " return node.fromDateFormatLong as fromDateFormatLong, node.toDateFormatLong as toDateFormatLong";  
         try { 

          val pointResults = neo.cypher(pointQuery).loadDataFrame; //did i need to batch here? 
          var prRow = pointResults.collect();  
          //do stuff with prRow loadDataFrame   
         } catch { 
          case e: Exception => e.printStackTrace 
         } 
         //do way more stuff with the data just in some scala/java datastructures 
        } 
       } catch { 
        case e: Exception => println("EMPTY COLLECTION") 
      } 
    } 
} 

运行该useses火花连接器连接到的Neo4j我得到这些错误在/var/log/neo4j/neo4j.log

java.lang.OutOfMemoryError: Java heap space 
2017-12-27 03:17:13.969+0000 ERROR Worker for session '13662816-0a86-4c95-8b7f-cea9d92440c8' crashed. Java heap space 
java.lang.OutOfMemoryError: Java heap space 
     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855) 
     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2068) 
     at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) 
     at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.run(RunnableBoltWorker.java:88) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
     at java.lang.Thread.run(Thread.java:748) 
     at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:109) 
2017-12-27 03:17:23.244+0000 ERROR Worker for session '75983e7c-097a-4770-bcab-d63f78300dc5' crashed. Java heap space 
java.lang.OutOfMemoryError: Java heap space 

我知道Neo4j的火花提交工作.conf文件我可以改变heapsizes(目前注释掉,但设置为512米)我所要求的是它在conf文件中所说的内容:

# Java Heap Size: by default the Java heap size is dynamically 
# calculated based on available system resources. 
# Uncomment these lines to set specific initial and maximum 
# heap size. 

所以,这并不意味着我应该把这里的堆放在这里,如果他们的计算肯定会超过我所能设定的范围,那么这个堆栈就是这样吗? (这些机器有8个内核和8个内存RAM)。或者会专门设置这些真的有帮助?也许到2000年(如果它的兆字节),得到两个演出?我问,因为我觉得错误日志文件是给这个内存不足的错误,但它确实是由于不同的原因。

编辑调试我的jvm值。登录

BEFORE:

2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] NETWORK 
2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information: 
2017-12-26 16:24:06.771+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB 
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 5.49 GB 
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB 
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB 
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.49 GB 
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information: 
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free memory: 85.66 MB 
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 126.00 MB 
2017-12-26 16:24:06.774+0000 INFO [o.n.k.i.DiagnosticsManager] Max memory: 1.95 GB 
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space] 
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen] 
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.93 MB, max=240.00 MB, threshold=0.00 B 
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.41 MB, max=-1.00 B, threshold=0.00 B 
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B 
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=39.00 MB, used=35.00 MB, max=-1.00 B, threshold=? 
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=3.00 MB, used=3.00 MB, max=-1.00 B, threshold=? 
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=84.00 MB, used=1.34 MB, max=1.95 GB, threshold=0.00 B 
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information: 
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8 
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000 
2017-12-26 16:24:06.780+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103 
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: [email protected] 
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN 
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT 
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information: 
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM 
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation 
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12 
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers 
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8] 
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath: 

AFTER:

2017-12-27 16:17:30.740+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information: 
2017-12-27 16:17:30.749+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB 
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 4.23 GB 
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB 
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB 
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.19 GB 
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information: 
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free memory: 1.89 GB 
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 1.95 GB 
2017-12-27 16:17:30.752+0000 INFO [o.n.k.i.DiagnosticsManager] Max memory: 1.95 GB 
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space] 
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen] 
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.89 MB, max=240.00 MB, threshold=0.00 B 
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.42 MB, max=-1.00 B, threshold=0.00 B 
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B 
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=105.00 MB, used=59.00 MB, max=-1.00 B, threshold=? 
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=0.00 B, used=0.00 B, max=-1.00 B, threshold=? 
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=1.85 GB, used=0.00 B, max=1.95 GB, threshold=0.00 B 
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information: 
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8 
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000 
2017-12-27 16:17:30.781+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103 
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: [email protected] 
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN 
2017-12-27 16:17:30.814+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT 
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information: 
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM 
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation 
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12 
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers 
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-Xms2000m, -Xmx2000m, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8] 
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath: 

只是一个供参考的,我似乎仍然得到Java堆错误。这些机器(不适用于生产只是开发)只有8GB每个

+0

每个查询返回多少数据? –

+0

使用批处理时,您需要使用'WITH ... SKIP {_skip} LIMIT {_limit}' –

+0

使用参数,而不是字符串连接 –

回答

1

我们通常建议您自己设置这些。您可以在启动过程中检查您的debug.log文件是否可以报告它选择使用的值作为默认值。您正在寻找这样的摘录:

JVM memory information: 
Free memory: 204.79 MB 
Total memory: 256.00 MB 
Max memory: 4.00 GB 

我相信总内存是初始堆大小,最大内存是最大堆大小。

当您自己设置时,我们通常建议保持初始值和最大值设置为相同的值。以下是关于estimating initial memory configuration的知识库文章,可能会对您有所帮助。

如果默认值看起来足够了,那么最好查找其他区域进行优化,或者查看问题是否在apache-spark方面是已知的。

+0

谢谢我将检查调试日志。我确定了2000米,并在火花中重燃。如果它仍然失败,我会认为这是关于火花是如何处理的东西,并吠叫树。再次感谢,寻找的节选是超级有用的! – Codejoy

+0

更新了我的帖子。只是想知道我在做什么。它仍然抱怨堆错误:/我不知道如何衡量我真正使用或不是真正的资源..我认为数据库只有130mb。 – Codejoy

+0

数据库中有55,000个节点。看起来不是很多。 – Codejoy

相关问题