2015-12-02 82 views
0

我使用Cloudera的快速启动VM CDH5.3.0(在包裹方面束)和Spark 1.2.0 $SPARK_HOME=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark并提交星火应用程序中使用的命令java.io.IOException异常:没有文件系统的方案:HDFS

./bin/spark-submit --class <Spark_App_Main_Class_Name> --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/<Spark_App_Target_Jar_Name>.jar

Spark_App_Main_Class_Name.scala

import org.apache.spark.SparkContext 
import org.apache.spark.SparkConf 
import org.apache.spark.mllib.util.MLUtils 


object Spark_App_Main_Class_Name { 

    def main(args: Array[String]) { 
     val hConf = new SparkConf() 
      .set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName) 
      .set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName) 
     val sc = new SparkContext(hConf) 
     val data = MLUtils.loadLibSVMFile(sc, "hdfs://localhost.localdomain:8020/analytics/data/mllib/sample_libsvm_data.txt") 
     ... 
    } 

} 

但我收到ClassNotFoundExceptionorg.apache.hadoop.hdfs.DistributedFileSystem而火花SUBM itting在客户端模式

[[email protected] bin]$ ./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar 
15/11/30 09:46:34 INFO SparkContext: Spark configuration: 
spark.app.name=Spark_App_Main_Class_Name 
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native 
spark.eventLog.dir=hdfs://localhost.localdomain:8020/user/spark/applicationHistory 
spark.eventLog.enabled=true 
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native 
spark.executor.memory=4G 
spark.jars=file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar 
spark.logConf=true 
spark.master=spark://localhost.localdomain:7077 
spark.yarn.historyServer.address=http://localhost.localdomain:18088 
15/11/30 09:46:34 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.113.234.150 instead (on interface eth12) 
15/11/30 09:46:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
15/11/30 09:46:34 INFO SecurityManager: Changing view acls to: cloudera 
15/11/30 09:46:34 INFO SecurityManager: Changing modify acls to: cloudera 
15/11/30 09:46:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera) 
15/11/30 09:46:35 INFO Slf4jLogger: Slf4jLogger started 
15/11/30 09:46:35 INFO Remoting: Starting remoting 
15/11/30 09:46:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:59473] 
15/11/30 09:46:35 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:59473] 
15/11/30 09:46:35 INFO Utils: Successfully started service 'sparkDriver' on port 59473. 
15/11/30 09:46:36 INFO SparkEnv: Registering MapOutputTracker 
15/11/30 09:46:36 INFO SparkEnv: Registering BlockManagerMaster 
15/11/30 09:46:36 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20151130094636-8c3d 
15/11/30 09:46:36 INFO MemoryStore: MemoryStore started with capacity 267.3 MB 
15/11/30 09:46:38 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7d1f2861-a568-4919-8f7e-9a9fe6aab2b4 
15/11/30 09:46:38 INFO HttpServer: Starting HTTP Server 
15/11/30 09:46:38 INFO Utils: Successfully started service 'HTTP file server' on port 50003. 
15/11/30 09:46:38 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/11/30 09:46:38 INFO SparkUI: Started SparkUI at http://10.113.234.150:4040 
15/11/30 09:46:39 INFO SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar at http://10.113.234.150:50003/jars/Spark_App_Target_Jar_Name.jar with timestamp 1448894799228 
15/11/30 09:46:39 INFO AppClient$ClientActor: Connecting to master spark://localhost.localdomain:7077... 
15/11/30 09:46:40 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20151130094640-0000 
15/11/30 09:46:41 INFO NettyBlockTransferService: Server created on 56458 
15/11/30 09:46:41 INFO BlockManagerMaster: Trying to register BlockManager 
15/11/30 09:46:41 INFO BlockManagerMasterActor: Registering block manager 10.113.234.150:56458 with 267.3 MB RAM, BlockManagerId(<driver>, 10.113.234.150, 56458) 
15/11/30 09:46:41 INFO BlockManagerMaster: Registered BlockManager 
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found 
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047) 
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) 
    at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90) 
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:352) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:92) 
    at Spark_App_Main_Class_Name$.main(Spark_App_Main_Class_Name.scala:22) 
    at Spark_App_Main_Class_Name.main(Spark_App_Main_Class_Name.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found 
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953) 
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045) 
    ... 16 more 

应用看来,星火应用程序不能够映射HDFS,因为最初我得到的错误:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs 
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) 
    at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90) 
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:352) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:92) 
    at LogisticRegressionwithBFGS$.main(LogisticRegressionwithBFGS.scala:21) 
    at LogisticRegressionwithBFGS.main(LogisticRegressionwithBFGS.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

,我跟着hadoop No FileSystem for scheme: file添加“FS。 hdfs.impl“和”fs.file.impl“添加到Spark配置设置中

回答

0

我已经通过一些详细的搜索和做了不同的试用方法后通过这个问题。基本上,这个问题似乎是由于Hadoop的HDFS罐子不可用,但在提交火花申请,因罐子找不到,即使使用maven-assembly-pluginmaven-jar-plugin/maven-dependency-plugin

maven-jar-plugin/maven-dependency-plugin组合,后主类罐子和正在创建的依赖罐子,但仍然提供与--jar选项因罐子导致了同样的错误被“krookedking”如下

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G --jars ../apps/Spark_App_Target_Jar_Name-dep.jar ../apps/Spark_App_Target_Jar_Name.jar 

使用maven-shade-pluginhadoop-no-filesystem-for-scheme-file建议似乎击中了问题右页oint,因为创建包含主类和所有相关类的单个jar文件消除了类路径问题。

我的最终工作火花提交命令的内容如下:

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar 

在我的项目pom.xml中maven-shade-plugin如下:

<plugin> 
     <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-shade-plugin</artifactId> 
     <version>2.4.2</version> 
     <executions> 
      <execution> 
       <phase>package</phase> 
       <goals> 
        <goal>shade</goal> 
       </goals> 
       <configuration> 
        <filters> 
         <filter> 
          <artifact>*:*</artifact> 
          <excludes> 
           <exclude>META-INF/*.SF</exclude> 
           <exclude>META-INF/*.DSA</exclude> 
           <exclude>META-INF/*.RSA</exclude> 
          </excludes> 
         </filter> 
        </filters> 
        <transformers> 
         <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> 
        </transformers> 
       </configuration> 
      </execution> 
     </executions> 
     </plugin> 

注:中相应的过滤器可排除将能够摆脱

java.lang.SecurityException: Invalid signature file digest for Manifest main attributes 
7

您需要在您的类路径中包含hadoop-hdfs-2.x罐子号(maven link)。 在提交您的应用程序时,使用spark-submit的--jar选项提及附加的jar位置。

另一个说明,你应该理想地移动到具有spark1.5的CDH5.5。

+0

添加了--jars选项Hadoop的HDFS罐,而火花提交但是给抛出java.lang.ClassNotFoundException: somnathchakrabarti

+0

索姆纳特,可以提供完整的火花提交命令 –

+0

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 - 部署模式客户端--executor - 内存4G --jars /opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*.jar。 ./apps/Spark_App_Target_Jar_Name.jar解决了ClassNotFoundException,但没有看到任何完成的应用程序在Spark Master WebUI – somnathchakrabarti

-1

从IDE运行Spark代码并访问远程HDFS时,我遇到了同样的问题。
因此,我设置了以下配置,并得到解决。

JavaSparkContext jsc=new JavaSparkContext(conf); 
Configuration hadoopConfig = jsc.hadoopConfiguration(); 
hadoopConfig.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); 
hadoopConfig.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName()); 
+0

请添加一些上下文到您的答案。解释它是如何解决这个问题的。您可能会拒绝投票和/或关闭 –

+0

并至少修复缩进 –

相关问题