0
我有以下RDD,每个记录(BIGINT,载体)的元组:pyspark:扩大DenseVector到元组到RDD
myRDD.take(5)
[(1, DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432])),
(1, DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432])),
(0, DenseVector([5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0])),
(1, DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432])),
(1, DenseVector([9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432]))]
如何展开密集的载体,使其一部分一个元组?即我希望以上成为:
[(1, 9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432),
(1, 9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432),
(0, 5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0),
(1, 9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432),
(1, 9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432)]
谢谢!
提示:'Vector'是可迭代的。其他一切都是一个基本的Python(参数拆包可能是有用的,但不是必需的)。 – zero323
谢谢zero323!我尝试newRDD = myRDD.map(lambda x:(x [0],tuple(x [1]))),它确实将DenseVector展开为一个元组,但我仍然在元组内部找到一个元组,如:(1, (1,9.2463,1.0,0.392,0.3381,162.6437,7.9432)),这个嵌套元组变成一个元组的任何提示?谢谢! – Edamame