2017-08-16 58 views
0

我有一个键/值对RDD如何pyspark转换RDD为稀疏矩阵

{(("a", "b"), 1), (("a", "c"), 3), (("c", "d"), 5)} 

我怎么能拿稀疏矩阵:

0 1 3 0 
1 0 0 0 
3 0 0 5 
0 0 5 0 

from pyspark.mllib.linalg import Matrices 
Matrices.sparse(4, 4, [0, 2, 3, 5, 6], [1, 2, 0, 0, 3, 2], [1, 3, 1, 3, 5, 5]) 

import numpy as np 
from scipy.sparse import csc_matrix 
data = [1, 3, 1, 3, 5, 5] 
indices = [1, 2, 0, 0, 3, 2] 
indptr = [0, 2, 3, 5, 6] 
csc_matrix((data, indices, indptr), shape=(4, 4), dtype=np.float) 

回答

0

您可以将数据透视表应用到数据框然后转换为矩阵?