1
我需要从Spark中的MongoDB带来一些数据。我使用了mongo-spark-connector_2.11的spark mongo连接器。下面的书面代码 并运行它在火花壳测试Spark Mongo连接需要很长的时间,然后预计
def createReadConfig(topic: String): ReadConfig = {
val user =UserId
val pass = Password
val host = Host
val db = Database
val coll = Collection
val partitioner = MongoPaginateBySizePartitioner
ReadConfig(Map("uri" -> ("mongodb://" + user + ":" + pass + "@" + host + "/" +
db), "database" -> db, "collection" -> coll, "partitioner" -> partitioner))
}
val collectionRDD= MongoSpark.load(sc,admissionConfig)
collectionRDD.filter(doc=>doc.getObjectId("_id")==new ObjectId("objectId")).count
花更多然后20秒,得到的结果,而相同的查询了小于在蒙戈控制台秒。
为什么会发生这种情况,以及如何降低速度差异?