2014-12-04 51 views
3

嗨,我正在关注http://deeplearning.net/tutorial/code/convolutional_mlp.py代码来实现一个conv神经网络。我有输入图像的渠道是重要的,因此我想有3通道功能图作为0层输入。如何为3通道输入图像创建layer0输入

所以,我需要这样的

layer0_input = x.reshape((batch_size, 3, 240, 135)) # width 240, height 135, 3 channels

而不是

layer0_input = x.reshape((batch_size, 1, 28, 28)) # 28*28 normalized MNIST gray scale images

将在这里使用

layer0 = LeNetConvPoolLayer(
    rng, 
    input=layer0_input, 
    image_shape=(batch_size, 3, 240, 135), 
    filter_shape=(nkerns[0], 1, 5, 5), 
    poolsize=(2, 2) 
) 

其中x是提供给theano作为

train_model = theano.function(
    [index], 
    cost, 
    updates=updates, 
    givens={ 
     x: train_set_x[index * batch_size: (index + 1) * batch_size], 
     y: train_set_y[index * batch_size: (index + 1) * batch_size] 
    } 
) 

所以 - 我的问题是 - 我应该如何创建(形状)train_set_x?

随着(灰度强度 - 即单信道)train_set_x被创建为

shared_x = theano.shared(numpy.asarray(data_x, 
              dtype=theano.config.floatX), 

其中数据_x是长度的扁平numpy的阵列784(28 * 28个像素)

非常感谢建议

+0

你提到顶部的形状是多通道图像的标准输入形状,例如彩色图像。你能否让你的问题更清楚? – eickenberg 2014-12-04 08:23:59

+0

嗨,我的问题是非常具体的theano和此代码http://deeplearning.net/tutorial/code/convolutional_mlp.py。在此代码中,MNIST数字使用convnn和灰阶28 * 28输入图像进行分类。我正在为自己的彩色图像数据集创建对象检测的转换。我试图了解如何修改代码中的layer0输入数据结构,以允许theano将其理解为3个通道而不是1.希望可以清楚地看到 – Run2 2014-12-04 09:01:59

回答

5

我能够得到它的工作。我在这里粘贴了一些代码,可能有助于某些代码。不是很优雅 - 但工作。

def shuffle_in_unison(a, b): #courtsey http://stackoverflow.com/users/190280/josh-bleecher-snyder assert len(a) == len(b) shuffled_a = np.empty(a.shape, dtype=a.dtype) shuffled_b = np.empty(b.shape, dtype=b.dtype) permutation = np.random.permutation(len(a)) for old_index, new_index in enumerate(permutation): shuffled_a[new_index] = a[old_index] shuffled_b[new_index] = b[old_index] return shuffled_a, shuffled_b

def createDataSet(imagefolder): 

os.chdir(imagefolder) 

# total number of files 
number_of_files = len([item for item in os.listdir('.') if os.path.isfile(os.path.join('.', item))]) 

# get a shuffled list : I needed this because my image names were of the format n_x_<some details>.jpg 
# where n was my target and x was a number from 0 to m-1 where m was the number of samples 
# of the target value n. So I needed so shuffle and iterate while putting images in train 
# test and validate arrays 
image_index_array = range(0,number_of_files) 
random.seed(12) 
random.shuffle(image_index_array) 
# split 80/10/10 - train/test/val 
trainsize = int(number_of_files*.8) 
testsize = int(number_of_files*.1) 
valsize = number_of_files - trainsize - testsize 

# create the random value arrays of train/test/val by slicing the total image index array 
train_index_array = image_index_array[0:trainsize] 
test_index_array = image_index_array[trainsize:trainsize+testsize] 
validate_index_array = image_index_array[trainsize+testsize:] 

# initialize the data structures 
dataset = {'train':[[],[]],'test':[[],[]],'validate':[[],[]]} 

i_counter = 0 
train_X = [] 
train_y = [] 

test_X = [] 
test_y = [] 

val_X = [] 
val_y = [] 

for item in os.listdir('.'): 
    if not os.path.isfile(os.path.join('.', item)): 
     continue 

    if item.endswith('.pkl'): 
     continue 

    print 'Processing item ' + item 
    item_y = item.split('_')[0] 
    item_x = cv2.imread(item) 

    height, width = item_x.shape[:2] 

    # this was my requirement - skip it if you do not need it 
    if(height != 135 or width != 240): 
     continue 

    # get 3 channels 
    b,g,r = cv2.split(item_x) 

    item_x = [b,g,r] 
    item_x = np.array(item_x) 
    item_x = item_x.reshape(3,135*240) 

    if i_counter in test_index_array: 
     test_X.append(item_x) 
     test_y.append(item_y) 
    elif i_counter in validate_index_array: 
     val_X.append(item_x) 
     val_y.append(item_y) 
    else: 
     train_X.append(item_x) 
     train_y.append(item_y) 

    i_counter = i_counter + 1 

# fix the dimensions. Flatten out the channel and intensity dimensions  
train_X = np.array(train_X) 
train_X = train_X.reshape(train_X.shape[0],train_X.shape[1]*train_X.shape[2]) 
test_X = np.array(test_X) 
test_X = test_X.reshape(test_X.shape[0],test_X.shape[1]*test_X.shape[2]) 
val_X = np.array(val_X) 
val_X = val_X.reshape(val_X.shape[0],val_X.shape[1]*val_X.shape[2]) 

train_y = np.array(train_y) 
test_y = np.array(test_y) 
val_y = np.array(val_y) 

# shuffle the train and test arrays in unison 
train_X,train_y = shuffle_in_unison(train_X,train_y) 
test_X,test_y = shuffle_in_unison(test_X,test_y) 

# pickle them 
dataset['train'] = [train_X,train_y] 
dataset['test'] = [test_X,test_y] 
dataset['validate'] = [val_X,val_y] 
output = open('pcount.pkl', 'wb') 
cPickle.dump(dataset, output) 
output.close` 

一旦有了这种泡菜文件 您可以在这样convolutional_mlp.py使用它。

layer0_input = x.reshape((batch_size, 3, 135, 240)) 

# Construct the first convolutional pooling layer: 
# filtering reduces the image size to (135-8+1 , 240-5+1) = (128, 236) 
# maxpooling reduces this further to (128/2, 236/2) = (64, 118) 
# 4D output tensor is thus of shape (batch_size, nkerns[0], 64, 118) 
layer0 = LeNetConvPoolLayer(
    rng, 
    input=layer0_input, 
    image_shape=(batch_size, 3, 135, 240), 
    filter_shape=(nkerns[0], 3, 8, 5), 
    poolsize=(2, 2) 
) 

在logistic_sgd.py的load_data功能需要一个小的变化如下

f = open(dataset, 'rb') 
dump = cPickle.load(f) 
train_set = dump['train'] 
valid_set = dump['validate'] 
test_set = dump['test'] 
f.close() 

希望这有助于

+0

看起来不错。 – nouiz 2014-12-06 01:17:54

+0

如果你喜欢它,请投票赞成:) – Run2 2014-12-06 07:36:17