如何使用mongo-hadoop连接器使用spark来保存mongo集合中的数据？

我跟着mongo-hadoop连接器的documentation。如何使用mongo-hadoop连接器使用spark来保存mongo集合中的数据？

我能够将数据从inputCol收集testDB数据库传输到outputCol收集利用：

Configuration mongodbConfig = new Configuration(); 
mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat"); 

mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/testDB.inputCol"); 

JavaSparkContext sc = new JavaSparkContext(sparkClient.sparkContext); 

JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
       mongodbConfig,   // Configuration 
       MongoInputFormat.class, // InputFormat: read from a live cluster. 
       Object.class,    // Key class 
       BSONObject.class   // Value class 
      ); 


Configuration outputConfig = new Configuration(); 
outputConfig.set("mongo.output.format", 
         "com.mongodb.hadoop.MongoOutputFormat"); 
outputConfig.set("mongo.output.uri", 
         "mongodb://localhost:27017/testDB.outputCol"); 

documents.saveAsNewAPIHadoopFile(
       "file:///this-is-completely-unused", 
       Object.class, 
       BSONObject.class, 
       MongoOutputFormat.class, 
       outputConfig 
      );

我要救一个简单的文件说

{"_id":1, "name":"dev"}

在outputCol收集testDB数据库。

我该如何做到这一点？

来源

2015-07-13 dev ツ

对于使用查询在星火MongoDB中的Hadoop连接器，你可以使用：

mongodbConfig.set("mongo.input.query","{'_id':1,'name':'dev'}")

来源

2015-09-10 04:14:52

它是相同的，只是把你的BsonObject为RDD[(Object,BsonObject)]（即对象可以是任何东西，空应该是罚款），并将其保存为你做了文件

来源

2016-08-04 05:58:06

如何使用mongo-hadoop连接器使用spark来保存mongo集合中的数据？

回答

相关问题