2015-11-25 126 views
-3

我是Apache Spark和Cluster Computing的新手,我在独立模式下实现了Spark(与Master和Worker相同的机器),它对我来说工作得很好。Apache Spark多节点群集 - java.io.FileNotFoundException

然后,我下载了spark的预建版本,并按照这些说明放置在群集的每个节点上:http://spark.apache.org/docs/latest/spark-standalone.html#installing-spark-standalone-to-a-cluster

我的主节点的IP地址为172.17.0.224,我的从节点的IP地址为172.17.0.221,172.17.0.222和172.17.0.223。

而且我编辑slavesspark-env.sh文件,分别添加我的从服务器的IP地址和我主服务器的IP地址。

我启动了主节点start-master.sh并用start-slaves.sh启动了从节点,一切正常。

我使用命令spark-submit --class "Rice" --master spark://172.17.0.224:7077 cs453project/target/scala-2.11/simple-project_2.11-1.0.jar cs453project/input.txt cs453project/ouput2 cs453project/ouput3提交了我的spark-job。

这是我得到的错误信息:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
    15/11/25 11:22:27 INFO SparkContext: Running Spark version 1.5.2 

    15/11/25 11:22:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 

15/11/25 11:22:28 WARN Utils: Your hostname, node04 resolves to a loopback address: 127.0.1.1; using 172.17.0.224 instead (on interface eth0) 
15/11/25 11:22:28 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
15/11/25 11:22:28 INFO SecurityManager: Changing view acls to: ujjwal 
15/11/25 11:22:28 INFO SecurityManager: Changing modify acls to: ujjwal 
15/11/25 11:22:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ujjwal); users with modify permissions: Set(ujjwal) 
15/11/25 11:22:28 INFO Slf4jLogger: Slf4jLogger started 
15/11/25 11:22:28 INFO Remoting: Starting remoting 
15/11/25 11:22:28 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:58478] 
15/11/25 11:22:28 INFO Utils: Successfully started service 'sparkDriver' on port 58478. 
15/11/25 11:22:28 INFO SparkEnv: Registering MapOutputTracker 
15/11/25 11:22:28 INFO SparkEnv: Registering BlockManagerMaster 
15/11/25 11:22:28 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-bc18e422-d334-4fe5-9663-9439620ec054 
15/11/25 11:22:28 INFO MemoryStore: MemoryStore started with capacity 530.3 MB 
15/11/25 11:22:29 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7c6e0ad4-52ae-4f5a-9aaa-6ad9fbf48685/httpd-13d8dd4d-6ff1-450d-baac-f2702c7a4e5b 
15/11/25 11:22:29 INFO HttpServer: Starting HTTP Server 
15/11/25 11:22:29 INFO Utils: Successfully started service 'HTTP file server' on port 49496. 
15/11/25 11:22:29 INFO SparkEnv: Registering OutputCommitCoordinator 
15/11/25 11:22:29 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/11/25 11:22:29 INFO SparkUI: Started SparkUI at http://172.17.0.224:4040 
15/11/25 11:22:29 INFO SparkContext: Added JAR file:/home/ujjwal/cs453project/target/scala-2.11/simple-project_2.11-1.0.jar at http://172.17.0.224:49496/jars/simple-project_2.11-1.0.jar with timestamp 1448479349380 
15/11/25 11:22:29 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 
15/11/25 11:22:29 INFO AppClient$ClientEndpoint: Connecting to master spark://172.17.0.224:7077... 
15/11/25 11:22:29 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20151125112229-0001 
15/11/25 11:22:29 INFO AppClient$ClientEndpoint: Executor added: app-20151125112229-0001/0 on worker-20151125095922-172.17.0.221-33366 (172.17.0.221:33366) with 2 cores 
15/11/25 11:22:29 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151125112229-0001/0 on hostPort 172.17.0.221:33366 with 2 cores, 1024.0 MB RAM 
15/11/25 11:22:29 INFO AppClient$ClientEndpoint: Executor updated: app-20151125112229-0001/0 is now LOADING 
15/11/25 11:22:29 INFO AppClient$ClientEndpoint: Executor updated: app-20151125112229-0001/0 is now RUNNING 
15/11/25 11:22:29 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47843. 
15/11/25 11:22:29 INFO NettyBlockTransferService: Server created on 47843 
15/11/25 11:22:29 INFO BlockManagerMaster: Trying to register BlockManager 
15/11/25 11:22:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.224:47843 with 530.3 MB RAM, BlockManagerId(driver, 172.17.0.224, 47843) 
15/11/25 11:22:29 INFO BlockManagerMaster: Registered BlockManager 
15/11/25 11:22:29 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
15/11/25 11:22:30 INFO MemoryStore: ensureFreeSpace(157248) called with curMem=0, maxMem=556038881 
15/11/25 11:22:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 530.1 MB) 
15/11/25 11:22:30 INFO MemoryStore: ensureFreeSpace(14276) called with curMem=157248, maxMem=556038881 
15/11/25 11:22:30 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 530.1 MB) 
15/11/25 11:22:30 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.224:47843 (size: 13.9 KB, free: 530.3 MB) 
15/11/25 11:22:30 INFO SparkContext: Created broadcast 0 from textFile at build.scala:11 
15/11/25 11:22:30 INFO FileInputFormat: Total input paths to process : 1 
15/11/25 11:22:30 INFO SparkContext: Starting job: count at build.scala:13 
15/11/25 11:22:30 INFO DAGScheduler: Got job 0 (count at build.scala:13) with 108 output partitions 
15/11/25 11:22:30 INFO DAGScheduler: Final stage: ResultStage 0(count at build.scala:13) 
15/11/25 11:22:30 INFO DAGScheduler: Parents of final stage: List() 
15/11/25 11:22:30 INFO DAGScheduler: Missing parents: List() 
15/11/25 11:22:30 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at map at build.scala:12), which has no missing parents 
15/11/25 11:22:30 INFO MemoryStore: ensureFreeSpace(3424) called with curMem=171524, maxMem=556038881 
15/11/25 11:22:30 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 530.1 MB) 
15/11/25 11:22:30 INFO MemoryStore: ensureFreeSpace(1934) called with curMem=174948, maxMem=556038881 
15/11/25 11:22:30 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1934.0 B, free 530.1 MB) 
15/11/25 11:22:30 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.224:47843 (size: 1934.0 B, free: 530.3 MB) 
15/11/25 11:22:30 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861 
15/11/25 11:22:30 INFO DAGScheduler: Submitting 108 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at map at build.scala:12) 
15/11/25 11:22:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 108 tasks 
15/11/25 11:22:31 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:55861/user/Executor#-498212581]) with ID 0 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.221:49642 with 530.3 MB RAM, BlockManagerId(0, 172.17.0.221, 49642) 
15/11/25 11:22:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.221:49642 (size: 1934.0 B, free: 530.3 MB) 
15/11/25 11:22:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.221:49642 (size: 13.9 KB, free: 530.3 MB) 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.17.0.221): java.io.FileNotFoundException: File file:/home/ujjwal/cs453project/input.txt does not exist 
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) 
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) 
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140) 
    at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341) 
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) 
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108) 
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) 
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) 
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) 
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
    at org.apache.spark.scheduler.Task.run(Task.scala:88) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 

15/11/25 11:22:32 INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 1] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 3.1 in stage 0.0 (TID 5, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 2] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 6, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 3] 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 4] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 4.1 in stage 0.0 (TID 7, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 3.1 in stage 0.0 (TID 5) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 5] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 3.2 in stage 0.0 (TID 8, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 6) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 6] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 9, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 3.2 in stage 0.0 (TID 8) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 7] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 3.3 in stage 0.0 (TID 10, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 4.1 in stage 0.0 (TID 7) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 8] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 4.2 in stage 0.0 (TID 11, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 9) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 9] 
15/11/25 11:22:32 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 12, 172.17.0.221, PROCESS_LOCAL, 2217 bytes) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 3.3 in stage 0.0 (TID 10) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 10] 
15/11/25 11:22:32 ERROR TaskSetManager: Task 3 in stage 0.0 failed 4 times; aborting job 
15/11/25 11:22:32 INFO TaskSchedulerImpl: Cancelling stage 0 
15/11/25 11:22:32 INFO TaskSchedulerImpl: Stage 0 was cancelled 
15/11/25 11:22:32 INFO DAGScheduler: ResultStage 0 (count at build.scala:13) failed in 2.216 s 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 4.2 in stage 0.0 (TID 11) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 11] 
15/11/25 11:22:32 INFO DAGScheduler: Job 0 failed: count at build.scala:13, took 2.373631 s 
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 10, 172.17.0.221): java.io.FileNotFoundException: File file:/home/ujjwal/cs453project/input.txt does not exist 
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) 
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) 
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140) 
    at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341) 
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) 
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108) 
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) 
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) 
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) 
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
    at org.apache.spark.scheduler.Task.run(Task.scala:88) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 

Driver stacktrace: 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) 
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) 
    at scala.Option.foreach(Option.scala:236) 
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) 
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) 
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) 
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837) 
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850) 
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921) 
    at org.apache.spark.rdd.RDD.count(RDD.scala:1125) 
    at Rice$.main(build.scala:13) 
    at Rice.main(build.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.io.FileNotFoundException: File file:/home/ujjwal/cs453project/input.txt does not exist 
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) 
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) 
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140) 
    at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341) 
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) 
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108) 
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) 
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) 
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) 
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) 
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
    at org.apache.spark.scheduler.Task.run(Task.scala:88) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 
15/11/25 11:22:32 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 12) on executor 172.17.0.221: java.io.FileNotFoundException (File file:/home/ujjwal/cs453project/input.txt does not exist) [duplicate 12] 
15/11/25 11:22:32 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/11/25 11:22:32 INFO SparkContext: Invoking stop() from shutdown hook 
15/11/25 11:22:33 INFO SparkUI: Stopped Spark web UI at http://172.17.0.224:4040 
15/11/25 11:22:33 INFO DAGScheduler: Stopping DAGScheduler 
15/11/25 11:22:33 INFO SparkDeploySchedulerBackend: Shutting down all executors 
15/11/25 11:22:33 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 
15/11/25 11:22:33 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
15/11/25 11:22:33 INFO MemoryStore: MemoryStore cleared 
15/11/25 11:22:33 INFO BlockManager: BlockManager stopped 
15/11/25 11:22:33 INFO BlockManagerMaster: BlockManagerMaster stopped 
15/11/25 11:22:33 INFO SparkContext: Successfully stopped SparkContext 
15/11/25 11:22:33 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
15/11/25 11:22:33 INFO ShutdownHookManager: Shutdown hook called 
15/11/25 11:22:33 INFO ShutdownHookManager: Deleting directory /tmp/spark-7c6e0ad4-52ae-4f5a-9aaa-6ad9fbf48685 

能否请你帮我明白,我怎么能解决我的问题?谢谢!

回答

1

您使用的路径可能只有驱动程序本地。您必须使用所有工作人员都可以访问的路径。司机不会将实际数据发送给工人 - 这很不幸很慢。工作人员会尝试使用您给他们的路径读取数据。在这种情况下,它们将因为本地没有文件而失败。

+0

谢谢!你能给我建议一些想法吗? – user3180835