2016-05-15 79 views
3

我试图做numpy的下面,而无需使用一个循环的操作:矢量化在numpy的

  • 我有维度的矩阵X N * d和尺寸为N的向量y y保存整数范围从1到K.
  • 我想获得大小为K * d的矩阵M,其中M [i,:] = np.mean(X [y == i,:],0)

我可以在不使用循环的情况下实现吗?

随着循环,它会像这样。

import numpy as np 

N=3 
d=3 
K=2 

X=np.eye(N) 
y=np.random.randint(1,K+1,N) 
M=np.zeros((K,d)) 
for i in np.arange(0,K): 
    line=X[y==i+1,:] 
    if line.size==0: 
     M[i,:]=np.zeros(d) 
    else: 
     M[i,:]=mp.mean(line,0) 

在此先感谢您。

+0

是否K == N? y的值是否独特? –

+1

如果你显示了一些代码,这将是很酷的。 – Bonifacio2

+0

不,不。例如,如果K = 2,X = np.eye(3),Y = [1 2 1],我想M是[[1/2 1/2],[0 1 0]]。 – popuban

回答

3

这解决了这个问题,但创建了一个中间K×N布尔矩阵,并且不使用内置的平均函数。在某些情况下,这可能导致性能变差或数字稳定性变差。我让类标签范围从0K-1而不是1K

# Define constants 
K,N,d = 10,1000,3 

# Sample data 
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case 
X = randn(N,d) 

# Calculate means for each class, vectorized 

# Map samples to labels by taking a logical "outer product" 
mark = Y[None,:]==arange(0,K)[:,None] 

# Count number of examples in each class  
count = sum(mark,1) 

# Avoid divide by zero if no examples 
count += count==0 

# Sum within each class and normalize 
M = (dot(mark,X).T/count).T 

print(M, shape(M), shape(mark)) 
3

代码的基本收集特定的行关闭X和加入他们,我们有一个与NumPy在np.add.reduceat内置。因此,以此为焦点,以矢量化方式解决问题的步骤可能如下所列 -

# Get sort indices of y 
sidx = y.argsort() 

# Collect rows off X based on their IDs so that they come in consecutive order 
Xr = X[np.arange(N)[sidx]] 

# Get unique row IDs, start positions of each unique ID 
# and their counts to be used for average calculations 
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True) 

# Add rows off Xr based on the slices signified by the start positions 
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None]) 

# Setup output array and set row summed values into it at unique IDs row positions 
out = np.zeros((K,d)) 
out[unq] = vals