Spark不能用mongo-hadoop连接器的BSONFileInputFormat编译newAPIHadoopRDD

我在spark中使用mongo-hadoop客户端（r1.5.2）从mongoDB和bson中读取数据，请看以下链接：https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage。到目前为止，我可以从mongoDB读取没有问题。但是，bson配置甚至无法编译。请帮忙。Spark不能用mongo-hadoop连接器的BSONFileInputFormat编译newAPIHadoopRDD

我的代码在斯卡拉：

dataConfig.set("mapred.input.dir", "path.bson") 

    val documents = sc.newAPIHadoopRDD(
     dataConfig,     
     classOf[BSONFileInputFormat], 
     classOf[Object],    
     classOf[BSONObject])

错误：

Error:(56, 24) inferred type arguments [Object,org.bson.BSONObject,com.mongodb.hadoop.mapred.BSONFileInputFormat] do not conform to method newAPIHadoopRDD's type parameter bounds [K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]] 
    val documents = sc.newAPIHadoopRDD(
        ^

来源

2016-06-21 Hunter Lin

尝试使用BSONFileInputFormat而不是MongoInputFormat。还请指定您正在使用的mongo-hadoop连接器的版本。 –

我找到了解决的办法！这个问题似乎通过InputFormat

泛型newAPIHadoopRDD被要求输入的格式

F <: org.apache.hadoop.mapreduce.InputFormat[K,V]

虽然BSONFileInputFormat延伸FileInputFormat引起[K，V]，其延伸InputFormat [K，V]，它没有将K，V泛型指定为Object和BSONObject。（实际上，在BSONFileInputFormat中没有提到K，V泛型，这个类是否可以真正编译？）。

总之，解决的办法是投BSONFileInputFormat作为InputFormat与K和V子类定义的：

val documents = sc.newAPIHadoopRDD(
    dataConfig,     
    classOf[BSONFileInputFormat].asSubclass(classOf[org.apache.hadoop.mapreduce.lib.input.FileInputFormat[Object, BSONObject]]), 
    classOf[Object],    
    classOf[BSONObject])

现在工作没有任何问题:)

来源

2016-08-03 02:43:26

Spark不能用mongo-hadoop连接器的BSONFileInputFormat编译newAPIHadoopRDD

回答

相关问题