2016-07-21 43 views
2

我是新来的Apache星火,我创建了几个RDD的和DataFrames,缓存的他们,现在我想用下面星火列表中的所有缓存RDD名字

rddName.unpersist() 

命令unpersist一些人,但我可以”不要忘记他们的名字。我用sc.getPersistentRDDs,但输出不包含名称。我还使用浏览器查看缓存的rdds,但又没有名称信息。我错过了什么吗?

+0

eliasah,对不起,我以为我已经接受你的答案.. – fanbondi

回答

4

@ Dikei的回答实际上是正确的,但我相信你正在寻找的是sc.getPersistentRDDs

scala> val rdd1 = sc.makeRDD(1 to 100) 
# rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:27 

scala> val rdd2 = sc.makeRDD(10 to 1000) 
# rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at makeRDD at <console>:27 

scala> rdd2.cache.setName("rdd_2") 
# res0: rdd2.type = rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27 

scala> sc.getPersistentRDDs 
# res1: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27) 

scala> rdd1.cache.setName("foo") 
# res2: rdd1.type = foo ParallelCollectionRDD[0] at makeRDD at <console>:27 

scala> sc.getPersistentRDDs 
# res3: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27) 

现在让我们添加另一个RDD并将它命名为好:

scala> rdd3.setName("bar") 
# res4: rdd3.type = bar ParallelCollectionRDD[2] at makeRDD at <console>:27 

scala> sc.getPersistentRDDs 
# res5: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(1 -> rdd_2 ParallelCollectionRDD[1] at makeRDD at <console>:27, 0 -> foo ParallelCollectionRDD[0] at makeRDD at <console>:27) 

我们注意到,实际上它不会被持续。

0

rrdName变量没有特殊含义。这只是对RDD的参考。例如,在下面的代码

val rrdName: RDD[Something] 
val name2 = rrdName 

name2rrdName是两个引用指向相同的RDD。拨打name2.unpersist与拨打rrdName.unpersist相同。

如果您想要unpersist RDD,您必须手动保存对它的引用。