2013-02-12 29 views
5

由于Spring-Data-Hadoop尚未发布,因此很难找到正在运行的示例配置以将其与c​​loudera配合使用。如何使用Cloudera CDH4和Maven获得正在运行的Spring-Data-Hadoop项目

我需要选择哪种依赖关系来获得与CDH4(Hadoop 2.0.0-cdh4.1.3)一起运行的Spring-Data-Hadoop?

通过选择不同的apporches我得到这个异常:

  1. 空指针

    Exception in thread "SimpleAsyncTaskExecutor-1" java.lang.ExceptionInInitializerError 
        at org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:183) 
        at java.lang.Thread.run(Thread.java:722) 
        Caused by: java.lang.NullPointerException 
        at org.springframework.util.ReflectionUtils.makeAccessible(ReflectionUtils.java:405) 
        at org.springframework.data.hadoop.mapreduce.JobUtils.<clinit>(JobUtils.java:123) 
        ... 2 more 
    
  2. 版本missmatch 7至4

    Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 
        at org.apache.hadoop.ipc.Client.call(Client.java:1070) 
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) 
        at $Proxy1.getProtocolVersion(Unknown Source) 
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) 
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) 
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) 
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238) 
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203) 
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) 
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) 
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) 
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238) 
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) 
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372) 
        at org.springframework.data.hadoop.mapreduce.JobFactoryBean.afterPropertiesSet(JobFactoryBean.java:208) 
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1545) 
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1483) 
    ... 12 more 
    

回答

6

这是一个示范如何配置它。

Maven的设置:

注:

  • (Optinal)排除在弹簧数据的Hadoop的Hadoop的流媒体和Hadoop工具
  • 加入Hadoop的共同和Hadoop的HDFS与通用版本:2.0.0-cdhX.XX
  • 添加hadoop工具和hadoop流与mr1版本:2.0.0-mr1-cdhX.XX
  • Spr数据Hadoop目前仅支持MR1。所以请确保你没有将MR2包含在其他依赖项中。用mvn dependency:tree检查!

的pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 

    <groupId>com.example</groupId> 
    <artifactId>com.example.main</artifactId> 
    <version>0.0.1-SNAPSHOT</version> 
    <packaging>jar</packaging> 

    <properties> 
     <java-version>1.7</java-version> 
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 
     <spring.version>3.2.0.RELEASE</spring.version> 
     <spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version> 
     <hadoop.version.generic>2.0.0-cdh4.1.3</hadoop.version.generic> 
     <hadoop.version.mr1>2.0.0-mr1-cdh4.1.3</hadoop.version.mr1> 
    </properties> 

    <dependencies> 

     <dependency> 
      <groupId>org.springframework</groupId> 
      <artifactId>spring-core</artifactId> 
      <version>${spring.version}</version> 
      <exclusions> 
       <exclusion> 
        <groupId>commons-logging</groupId> 
        <artifactId>commons-logging</artifactId> 
       </exclusion> 
      </exclusions> 
     </dependency> 

     <dependency> 
      <groupId>org.springframework</groupId> 
      <artifactId>spring-context</artifactId> 
      <version>${spring.version}</version> 
     </dependency> 


     <dependency> 
      <groupId>org.springframework.data</groupId> 
      <artifactId>spring-data-hadoop</artifactId> 
      <version>${spring.hadoop.version}</version> 

      <exclusions> 
       <!-- Excluded the Hadoop dependencies to be sure that they are not mixed 
        with them provided by cloudera. --> 
       <exclusion> 
        <artifactId>hadoop-streaming</artifactId> 
        <groupId>org.apache.hadoop</groupId> 
       </exclusion> 
       <exclusion> 
        <artifactId>hadoop-tools</artifactId> 
        <groupId>org.apache.hadoop</groupId> 
       </exclusion> 
      </exclusions> 

     </dependency> 

     <!-- Hadoop Cloudera Dependencies --> 
     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-common</artifactId> 
      <version>${hadoop.version.generic}</version> 
     </dependency> 

     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-hdfs</artifactId> 
      <version>${hadoop.version.generic}</version> 
     </dependency> 

     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-tools</artifactId> 
      <version>2.0.0-mr1-cdh4.1.3</version> 
     </dependency> 

     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-streaming</artifactId> 
      <version>2.0.0-mr1-cdh4.1.3</version> 
     </dependency> 

    </dependencies> 

    <build> 
     <plugins> 

      <plugin> 
       <groupId>org.apache.maven.plugins</groupId> 
       <artifactId>maven-compiler-plugin</artifactId> 
       <configuration> 
        <source>${java-version}</source> 
        <target>${java-version}</target> 
       </configuration> 
      </plugin> 

     </plugins> 
    </build> 

    <repositories> 
     <repository> 
      <id>spring-milestones</id> 
      <url>http://repo.springsource.org/libs-milestone</url> 
      <snapshots> 
       <enabled>false</enabled> 
      </snapshots> 
     </repository> 

     <repository> 
      <id>cloudera</id> 
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> 
      <snapshots> 
       <enabled>false</enabled> 
      </snapshots> 
     </repository> 

     <repository> 
      <id>spring-snapshot</id> 
      <name>Spring Maven SNAPSHOT Repository</name> 
      <url>http://repo.springframework.org/snapshot</url> 
     </repository> 
    </repositories> 
</project> 

弹簧设置(applicationContext.xml中):

与你的NameNode域

<?xml version="1.0" encoding="UTF-8"?> 
<beans xmlns="http://www.springframework.org/schema/beans" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:hdp="http://www.springframework.org/schema/hadoop" 
    xsi:schemaLocation=" 
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd 
        http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd 
        http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/integration 
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd"> 

    <hdp:configuration id="hadoopConfiguration"> 
     fs.default.name=hdfs://example.com:8020 
    </hdp:configuration> 

    <hdp:job id="wordCountJob" 
     mapper="com.example.WordMapper" 
     reducer="com.example.WordReducer" 
     input-path="/user/christian/input/test" 
     output-path="/user/christian/output2" /> 

    <hdp:job-runner job-ref="wordCountJob" run-at-startup="true" 
     wait-for-completion="true" /> 

更换fs.default.name有了这个,你应该是能够访问您的群集。

一些参考:

0

嘿,你可以从https://github.com/spring-projects/spring-data-book下载。

构建并运行它在Read me文档中给出。

+0

虽然链接可能会回答问题,但请考虑在答案中添加重要的问题/摘要。这样做会确保即使提供的链接变为不活动状态,您的答案仍然有用。在SO中只有链接的答案是不鼓励的。 – Harry 2013-11-26 04:56:15

+0

这个答案很神秘,至少。它不回答这个问题。 – waste 2016-01-20 06:30:41