CNN: Details of Zeiler Fergus Net

Question

I want to replicate the modified AlexNet by Zeiler and Fergus from 2013 (Visualizing and Understanding Convolutional Networks) but they spare some details. Hope someone here knows more about it.

What is their exact learning rate schedule? They just write "We
anneal the learning rate throughout training manually when the
validation error plateaus".
Do they use weight decay?
In which layers do they "renormalize" the filters (they do not
divide the input by the global standard deviation)?
I do not understand their architecture completely: In the first
layer: 224 -> 110 with filters of width/height 7 and stride 2. Do
they add a padding of one only on one side because 110*2+5=225 or am
I wrong? Same for 3x3 maxpooling 26 -> 13 with stride 2.

yasin.yazici · Answer

A partial answer:

1-) That is a type of learning procedure. As certain learning rate can't reduce objective function further, learning rate is reduced and training continues. This behaviour is similar to over-shooting. After some time, the learning rate may become too big to reduce error rate. So it is reduced in some degree. The simplist one is to divide the learning rate by a constant, 5,10 i.e.

2-) I think they did, because AlexNet has used it. Most of their settings are taken from AlexNet.

3-)

4-) During pooling, padding may used to complete non-overlapping regions of input space and pooling region. For example, 3x3 pooling with 2 strides on 26x26 input region should be padded with 1 from single side.

CNN: Details of Zeiler Fergus Net

One Answer

Add your own answers!

Ask a Question