我使用HDP 2.5,运行spark-submit作为纱线集群模式。纱线上的火花,带有非零退出码的集装箱退出143
我试图使用数据帧交叉连接生成数据。 即
val generatedData = df1.join(df2).join(df3).join(df4)
generatedData.saveAsTable(...)....
DF1存储水平MEMORY_AND_DISK
DF2,DF3,DF4存储水平MEMORY_ONLY
DF1有更多的记录,即500万,而DF2到DF4至多100条记录。 这样做我解释明白会导致更好的性能使用BroadcastNestedLoopJoin解释计划。
由于某种原因,它总是失败。我不知道如何调试它以及内存在哪里爆炸。
错误日志输出:
16/12/06 19:44:08 WARN YarnAllocator: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
16/12/06 19:44:08 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
16/12/06 19:44:08 ERROR YarnClusterScheduler: Lost executor 1 on hdp4: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
16/12/06 19:44:08 WARN TaskSetManager: Lost task 1.0 in stage 12.0 (TID 19, hdp4): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
我没有看到这个错误之前的任何警告或错误日志。 问题是什么?我应该在哪里寻找内存消耗? 我看不到任何东西存储 SparkUI的选项卡。 日志从纱资源管理器UI采取HDP 2.5
编辑 看着在容器日志,好像它是一个java.lang.OutOfMemoryError: GC overhead limit exceeded
我知道如何增加内存,但我不没有任何记忆了。 如何在没有出现此错误的情况下使用4个数据框执行笛卡尔/产品连接?所有容器的
如果数据框的大小与您所建议的一样(5e6,100,100,100),则笛卡尔产品将具有大约5e12条记录,即5万亿条记录。您没有提及列的数量,但是如果您有一个整数列,这将需要数TB的存储空间。如果你有多个列,联合数据库可能需要数百或数千兆字节。这真的是你想要的吗? – abeboparebop
1栏。这是一个数据生成器工具,导致内存爆炸。 –