我的目标是让每个数据点的k个最近邻居。我想避免在查找时使用for循环,并在每个rdd_distance
点上同时使用其他的东西,但我无法弄清楚如何执行此操作。如何避免KNN搜索循环?
parsedData = RDD[Object]
//Object have an id and a vector as attribute
//sqdist1 output is a Double
var rdd_distance = parsedData.cartesian(parsedData)
.flatMap { case (x,y) =>
if(x.get_id != y.get_id)
Some((x.get_id,(y.get_id,sqdist1(x.get_vector,y.get_vector))))
else None
}
for(ind1 <- 1 to size) {
val ind2 = ind1.toString
val tab1 = rdd_distance.lookup(ind2)
val rdd_knn0 = sc.parallelize(tab1)
val tab_knn = rdd_knn0.takeOrdered(k)(Ordering[(Double)].on(x=>x._2))
}
这是可能的,而不使用for循环查找?
看看这个https://stackoverflow.com/questions/5751114/nearest-neighbors-in-high-dimensional-data – abalcerek