1

我的脚本是用python编写的,它在没有docker环境的DSE 4.8上运行良好。现在我升级到DSE 5.0.4并在Docker环境中运行它,现在我得到了下面的RPC错误。在我使用DSE Spark版本1.4.1之前,现在我正在使用1.6.2。为什么会发生Spark 1.6.2 RPC错误消息?

主机操作系统Centos 7.2和Docker OS是一样的。我们使用spark来提交一个任务,我们试着给执行者2G,4G,6G和8G,他们都给出了相同的错误信息。

相同的python脚本在我的以前的环境中运行没有问题,但现在我更新它不能正常工作。

对于scala操作,代码在当前环境中正常运行,只有Python部分存在问题。重置主机仍然没有解决问题。重新创建码头集装箱也没有帮助解决问题。

编辑:

也许我的MapReduce的功能实在是太复杂了。这个问题可能在这里但不确定。

规格环境: 群集组由6主机,每台主机具有16个内核的CPU,32G内存,500G SSD

不知道如何解决这个问题?此外,这个错误信息是什么意思?非常感谢!让我知道你是否需要更多信息。

错误日志:

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 
WARN 2017-02-26 10:14:08,314 org.apache.spark.scheduler.TaskSetManager: Lost task 47.1 in stage 88.0 (TID 9705, 139.196.190.79): TaskKilled (killed intentionally) 
Traceback (most recent call last): 
    File "/data/user_profile/User_profile_step1_classify_articles_common_sc_collect.py", line 1116, in <module> 
    compute_each_dimension_and_format_user(article_by_top_all_tmp) 
    File "/data/user_profile/User_profile_step1_classify_articles_common_sc_collect.py", line 752, in compute_each_dimension_and_format_user 
    sqlContext.createDataFrame(article_up_save_rdd, df_schema).write.format('org.apache.spark.sql.cassandra').options(keyspace='archive', table='articles_up_update').save(mode='append') 
    File "/opt/dse-5.0.4/resources/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 395, in save 
WARN 2017-02-26 10:14:08,336 org.apache.spark.scheduler.TaskSetManager: Lost task 63.1 in stage 88.0 (TID 9704, 139.196.190.79): TaskKilled (killed intentionally) 
    File "/opt/dse-5.0.4/resources/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ 
    File "/opt/dse-5.0.4/resources/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco 
    File "/opt/dse-5.0.4/resources/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling o795.save. 
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 619 in stage 88.0 failed 4 times, most recent failure: Lost task 619.3 in stage 88.0 (TID 9746, 139.196.107.73): ExecutorLostFailure (executor 59 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 
Driver stacktrace: 
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) 
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) 
at org.apache.spark.scheduler.DAGScheduler$$anonfun$han 

泊坞窗命令:

docker run -d --net=host -i --privileged \ 
    -e SEEDS=10.XX.XXx.XX 1,10.XX.XXx.XXX \ 
    -e CLUSTER_NAME="MyCluster" \ 
    -e LISTEN_ADDRESS=10.XX.XXx.XX \ 
    -e BROADCAST_RPC_ADDRESS=139.XXX.XXX.XXX \ 
    -e RPC_ADDRESS=0.0.0.0 \ 
    -e STOMP_INTERFACE=10.XX.XXx.XX \ 
    -e HOSTS=139.XX.XXx.XX \ 
    -v /data/dse/lib/cassandra:/var/lib/cassandra \ 
    -v /data/dse/lib/spark:/var/lib/spark \ 
    -v /data/dse/log/cassandra:/var/log/cassandra \ 
    -v /data/dse/log/spark:/var/log/spark \ 
    -v /data/agent/log:/opt/datastax-agent/log \ 
    --name dse_container registry..xxx.com/rechao/dse:5.0.4 -s 
+1

您更新的不仅仅是Datastax。您现在使用Docker,并且错误明确提到“超出阈值或网络问题”,那么您的主机操作系统以及您给执行程序的内存分配是什么? –

+0

@ cricket_007主机操作系统Centos 7.2和Docker操作系统是一样的。我们使用spark来提交一个任务,我们试着给执行者2G,4G,6G和8G,他们都给出了相同的错误信息。任何想法为什么?谢谢 – peter

+1

好的,那么这可能是一个网络问题。容器是否暴露了适当的端口? –

回答

0

泊坞窗是好的,增加主机内存64G可以解决这个问题。

相关问题