datastax spark java堆空间错误

我试图使用scala执行小计算。我使用datastax-4.6。我有6个节点，每个16GB内存和8个内核。当我尝试执行scala程序时，它显示follwing错误。datastax spark java堆空间错误

ERROR ActorSystemImpl：从螺纹未捕获的致命错误[sparkDriver-akka.actor.default-调度-17]关停ActorSystem [sparkDriver] java.lang.OutOfMemoryError：Java堆空间。我为每台机器配置了2个内核，执行器内存为4GB，驱动程序内存为4GB。有什么建议么？？。

来源

2015-05-13 shiva

直接引用Russ's article on Common Spark Troubleshooting（你应该看它！）：

Spark Executor OOM:

How to set Memory Parameters on Spark Once a app is running the next most likely error you will see is an OOM on a spark executor. Spark is an extremely powerful tool for doing in-memory computation but it’s power comes with some sharp edges. The most common cause for an executor OOM’ing is that the application is trying to cache or load too much information into memory. Depending on your use case there are several solutions to this:

1) Increase the parallelism of your job. Try increasing the number of partitions in your job. By splitting the work into smaller sets of data less information will have to be resident in memory at a given time. For a Spark Cassandra Connector job this would mean decreasing the split size variable. The variable, spark.cassandra.input.split.size, can be set either on the command line as above or in the SparkConf object. For other RDD types look into their api’s to determine exactly how they determine partition size.

2) Increase the storage fraction variable, spark.storage.memoryFraction. This can be set as above on either the command line or in the SparkConf object. This variable sets exactly how much of the JVM will be dedicated to the caching and storage of RDD’s. You can set it as a value between 0 and 1, describing what portion of executor JVM memory will be dedicated for caching RDDs. If you have a job that will require very little shuffle memory but will utilize a lot of cached RDD’s increase this variable (example: Caching an RDD then performing aggregates on it.)

3) If all else fails you may just need additional ram on each worker. For DSE users adjust your spark-env.sh (or dse.yaml file in DSE 4.6) file to increase SPARK_MEM reserved for Spark jobs. You will need to restart your workers for these new memory limits to take effect (dse sparkworker restart.) Then increase the amount of ram the application requests by setting spark.executor.memory variable either on the command line or in the SparkConf object.

来源

2015-05-13 15:18:25 phact

嗨PHACT，对答复表示感谢。我是卡桑德拉的新手，你能指导我如何设置1和2选项。我在spark-env.sh文件中有sparkund.executor.memory选项，但没有其他两个。如果你指导我解决我的问题，我会很感激。 – shiva

datastax spark java堆空间错误

回答

相关问题