-1

所有梯度在朱古力

我遇到问题时,我用来自Caffe批标准化使用批标准化时消失。 这是我在train_val.prototxt中使用的代码。

layer { 
     name: "conv1" 
     type: "Convolution" 
     bottom: "conv0" 
     top: "conv1" 
     param { 
     lr_mult: 1 
     decay_mult: 1 
     } 
     param { 
     lr_mult: 0 
     decay_mult: 0 
     } 
     convolution_param { 
     num_output: 32 
     pad: 1 
     kernel_size: 3 
     weight_filler { 
      type: "gaussian" 
      std: 0.0589 
     } 
     bias_filler { 
      type: "constant" 
      value: 0 
     } 
     engine: CUDNN 
     } 
    } 
    layer { 
     name: "bnorm1" 
     type: "BatchNorm" 
     bottom: "conv1" 
     top: "conv1" 
     batch_norm_param { 
     use_global_stats: false 
     } 
    } 
    layer { 
     name: "scale1" 
     type: "Scale" 
     bottom: "conv1" 
     top: "conv1" 
     scale_param { 
     bias_term: true 
     } 
    } 
    layer { 
     name: "relu1" 
     type: "ReLU" 
     bottom: "conv1" 
     top: "conv1" 
    } 

layer { 
    name: "conv16" 
    type: "Convolution" 
    bottom: "conv1" 
    top: "conv16" 
    param { 
    lr_mult: 1 
    decay_mult: 1 
    } 

但是,培训没有收敛。通过去除BN层(蝙蝠科技+规模),训练可以收敛。所以我开始比较具有或不具有BN层的日志文件。下面是与DEBUG_INFO日志文件= TRUE:

随着BN:

I0804 10:22:42.074671 8318 net.cpp:638]  [Forward] Layer loadtestdata, top blob data data: 0.368457 
I0804 10:22:42.074757 8318 net.cpp:638]  [Forward] Layer loadtestdata, top blob label data: 0.514496 
I0804 10:22:42.076117 8318 net.cpp:638]  [Forward] Layer conv0, top blob conv0 data: 0.115678 
I0804 10:22:42.076200 8318 net.cpp:650]  [Forward] Layer conv0, param blob 0 data: 0.0455077 
I0804 10:22:42.076273 8318 net.cpp:650]  [Forward] Layer conv0, param blob 1 data: 0 
I0804 10:22:42.076539 8318 net.cpp:638]  [Forward] Layer relu0, top blob conv0 data: 0.0446758 
I0804 10:22:42.078435 8318 net.cpp:638]  [Forward] Layer conv1, top blob conv1 data: 0.0675479 
I0804 10:22:42.078516 8318 net.cpp:650]  [Forward] Layer conv1, param blob 0 data: 0.0470226 
I0804 10:22:42.078589 8318 net.cpp:650]  [Forward] Layer conv1, param blob 1 data: 0 
I0804 10:22:42.079108 8318 net.cpp:638]  [Forward] Layer bnorm1, top blob conv1 data: 0 
I0804 10:22:42.079197 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 0 data: 0 
I0804 10:22:42.079270 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 1 data: 0 
I0804 10:22:42.079350 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 2 data: 0 
I0804 10:22:42.079421 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 3 data: 0 
I0804 10:22:42.079505 8318 net.cpp:650]  [Forward] Layer bnorm1, param blob 4 data: 0 
I0804 10:22:42.080267 8318 net.cpp:638]  [Forward] Layer scale1, top blob conv1 data: 0 
I0804 10:22:42.080345 8318 net.cpp:650]  [Forward] Layer scale1, param blob 0 data: 1 
I0804 10:22:42.080418 8318 net.cpp:650]  [Forward] Layer scale1, param blob 1 data: 0 
I0804 10:22:42.080651 8318 net.cpp:638]  [Forward] Layer relu1, top blob conv1 data: 0 
I0804 10:22:42.082074 8318 net.cpp:638]  [Forward] Layer conv16, top blob conv16 data: 0 
I0804 10:22:42.082154 8318 net.cpp:650]  [Forward] Layer conv16, param blob 0 data: 0.0485365 
I0804 10:22:42.082226 8318 net.cpp:650]  [Forward] Layer conv16, param blob 1 data: 0 
I0804 10:22:42.082675 8318 net.cpp:638]  [Forward] Layer loss, top blob loss data: 42.0327 

没有BN:

I0803 17:01:29.700850 30274 net.cpp:638]  [Forward] Layer loadtestdata, top blob data data: 0.320584 
I0803 17:01:29.700920 30274 net.cpp:638]  [Forward] Layer loadtestdata, top blob label data: 0.236383 
I0803 17:01:29.701556 30274 net.cpp:638]  [Forward] Layer conv0, top blob conv0 data: 0.106141 
I0803 17:01:29.701633 30274 net.cpp:650]  [Forward] Layer conv0, param blob 0 data: 0.0467062 
I0803 17:01:29.701692 30274 net.cpp:650]  [Forward] Layer conv0, param blob 1 data: 0 
I0803 17:01:29.701835 30274 net.cpp:638]  [Forward] Layer relu0, top blob conv0 data: 0.0547961 
I0803 17:01:29.702193 30274 net.cpp:638]  [Forward] Layer conv1, top blob conv1 data: 0.0716117 
I0803 17:01:29.702267 30274 net.cpp:650]  [Forward] Layer conv1, param blob 0 data: 0.0473551 
I0803 17:01:29.702327 30274 net.cpp:650]  [Forward] Layer conv1, param blob 1 data: 0 
I0803 17:01:29.702425 30274 net.cpp:638]  [Forward] Layer relu1, top blob conv1 data: 0.0318472 
I0803 17:01:29.702781 30274 net.cpp:638]  [Forward] Layer conv16, top blob conv16 data: 0.0403702 
I0803 17:01:29.702847 30274 net.cpp:650]  [Forward] Layer conv16, param blob 0 data: 0.0474007 
I0803 17:01:29.702908 30274 net.cpp:650]  [Forward] Layer conv16, param blob 1 data: 0 
I0803 17:01:29.703228 30274 net.cpp:638]  [Forward] Layer loss, top blob loss data: 11.2245 

奇怪的是,在未来,开始batchnorm每一层都给出了0! ! 另外值得一提的是Relu(就地图层)只有4行,但是batchnorm和scale(应该也是就地图层)在日志文件中有6行和3行。你知道有什么问题。

+0

您使用的是什么版本的caffe? – Shai

回答

0

我不知道什么是错了你"BatchNorm"层,但它是非常奇怪:(!)
根据您的调试日志,您"BatchNorm"层内部PARAM斑点(0..4 )。综观batch_norm_layer.cpp的源代码应该只有内部PARAM斑点:

this->blobs_.resize(3); 

我建议你确保你正在使用"BatchNorm"执行不bugous。


关于调试日志,您可以在read here了解更多关于如何解释它。
为了解决你的问题

“RELU [...]只有4行,但batchnorm和规模[...]有6个和3个行日志文件”

注意每个图层都有一行"top blob ... data" - 报告输出blob的L2范数。
此外,每个图层都有一个额外的线,用于其每个内部权重。 "ReLU"图层没有内部参数,因此没有此图层的"param blob [...] data"的打印件。 "Convolution"层有两个内部参数(内核和偏差),因此额外两行为blob 0和blob 1.