2014-12-11 52 views
3

我已经实现了矩阵分解模型,比如说R = U * V,现在我要训练和测试这个模型。为此,给定一个稀疏矩阵R(缺失值为零),我想先在训练中隐藏一些非零元素,然后将这些非零元素用作测试集。如何从numpy.ndarray中随机选择一些非零元素?

如何从numpy.ndarray中随机选择一些非零元素?此外,我需要记住这些选定元素的索引和列位置,以便在测试中使用这些元素。

例如:

In [2]: import numpy as np 

In [4]: mtr = np.random.rand(10,10) 

In [5]: mtr 
Out[5]: 
array([[ 0.92685787, 0.95496193, 0.76878455, 0.12304856, 0.13804963, 
     0.30867502, 0.60245974, 0.00797898, 0.1060602 , 0.98277982], 
     [ 0.88879888, 0.40209901, 0.35274404, 0.73097713, 0.56238248, 
     0.380625 , 0.16432029, 0.5383006 , 0.0678564 , 0.42875591], 
     [ 0.42343761, 0.31957986, 0.5991212 , 0.04898903, 0.2908878 , 
     0.13160296, 0.26938537, 0.91442668, 0.72827097, 0.4511198 ], 
     [ 0.63979934, 0.33421621, 0.09218392, 0.71520048, 0.57100522, 
     0.37205284, 0.59726293, 0.58224992, 0.58690505, 0.4791199 ], 
     [ 0.35219557, 0.34954002, 0.93837312, 0.2745864 , 0.89569075, 
     0.81244084, 0.09661341, 0.80673646, 0.83756759, 0.7948081 ], 
     [ 0.09173706, 0.86250006, 0.22121994, 0.21097563, 0.55090202, 
     0.80954817, 0.97159981, 0.95888693, 0.43151554, 0.2265607 ], 
     [ 0.00723128, 0.95690539, 0.94214806, 0.01721733, 0.12552314, 
     0.65977765, 0.20845669, 0.44663729, 0.98392716, 0.36258081], 
     [ 0.65994805, 0.47697842, 0.35449045, 0.73937445, 0.68578224, 
     0.44278095, 0.86743906, 0.5126411 , 0.75683392, 0.73354572], 
     [ 0.4814301 , 0.92410622, 0.85267402, 0.44856078, 0.03887269, 
     0.48868498, 0.83618382, 0.49404473, 0.37328248, 0.18134919], 
     [ 0.63999748, 0.48718656, 0.54826717, 0.1001681 , 0.1940816 , 
     0.3937014 , 0.48768013, 0.70610649, 0.03213063, 0.88371607]]) 

In [6]: mtr = np.where(mtr>0.5, 0, mtr) 

In [7]: %clear 


In [8]: mtr 
Out[8]: 
array([[ 0.  , 0.  , 0.  , 0.12304856, 0.13804963, 
     0.30867502, 0.  , 0.00797898, 0.1060602 , 0.  ], 
     [ 0.  , 0.40209901, 0.35274404, 0.  , 0.  , 
     0.380625 , 0.16432029, 0.  , 0.0678564 , 0.42875591], 
     [ 0.42343761, 0.31957986, 0.  , 0.04898903, 0.2908878 , 
     0.13160296, 0.26938537, 0.  , 0.  , 0.4511198 ], 
     [ 0.  , 0.33421621, 0.09218392, 0.  , 0.  , 
     0.37205284, 0.  , 0.  , 0.  , 0.4791199 ], 
     [ 0.35219557, 0.34954002, 0.  , 0.2745864 , 0.  , 
     0.  , 0.09661341, 0.  , 0.  , 0.  ], 
     [ 0.09173706, 0.  , 0.22121994, 0.21097563, 0.  , 
     0.  , 0.  , 0.  , 0.43151554, 0.2265607 ], 
     [ 0.00723128, 0.  , 0.  , 0.01721733, 0.12552314, 
     0.  , 0.20845669, 0.44663729, 0.  , 0.36258081], 
     [ 0.  , 0.47697842, 0.35449045, 0.  , 0.  , 
     0.44278095, 0.  , 0.  , 0.  , 0.  ], 
     [ 0.4814301 , 0.  , 0.  , 0.44856078, 0.03887269, 
     0.48868498, 0.  , 0.49404473, 0.37328248, 0.18134919], 
     [ 0.  , 0.48718656, 0.  , 0.1001681 , 0.1940816 , 
     0.3937014 , 0.48768013, 0.  , 0.03213063, 0.  ]]) 

鉴于这种稀疏ndarray,我怎么能选择非零元素的20%,并记住它们的位置?

回答

6

我们将使用numpy.random.choice。首先,我们得到了(i,j)指数的阵列,其中的数据是非零:

i,j = np.nonzero(x) 

然后,我们将选择其中20%:

ix = np.random.choice(len(i), np.floor(0.2 * len(i)), replace=False) 

这里ix是随机的,唯一索引列表, 20%的长度为ij(长度为ij是非零条目的数量)。要恢复指数,我们做i[ix]j[ix],于是我们可以通过编写选择x非零项的20%:

print x[i[ix], j[ix]] 
相关问题