AnswerBun.com

CNN: Details of Zeiler Fergus Net

I want to replicate the modified AlexNet by Zeiler and Fergus from 2013 (Visualizing and Understanding Convolutional Networks) but they spare some details. Hope someone here knows more about it.

  1. What is their exact learning rate schedule? They just write “We
    anneal the learning rate throughout training manually when the
    validation error plateaus”.

  2. Do they use weight decay?

  3. In which layers do they “renormalize” the filters (they do not
    divide the input by the global standard deviation)?

  4. I do not understand their architecture completely: In the first
    layer: 224 -> 110 with filters of width/height 7 and stride 2. Do
    they add a padding of one only on one side because 110*2+5=225 or am
    I wrong? Same for 3×3 maxpooling 26 -> 13 with stride 2.

Cross Validated Asked by vrx on December 26, 2020

1 Answers

One Answer

A partial answer:

1-) That is a type of learning procedure. As certain learning rate can't reduce objective function further, learning rate is reduced and training continues. This behaviour is similar to over-shooting. After some time, the learning rate may become too big to reduce error rate. So it is reduced in some degree. The simplist one is to divide the learning rate by a constant, 5,10 i.e.

2-) I think they did, because AlexNet has used it. Most of their settings are taken from AlexNet.

3-)

4-) During pooling, padding may used to complete non-overlapping regions of input space and pooling region. For example, 3x3 pooling with 2 strides on 26x26 input region should be padded with 1 from single side.

Answered by yasin.yazici on December 26, 2020

Add your own answers!

Related Questions

What is the difference between RMSE and SEP

1  Asked on January 1, 2022 by tiago-dias

 

survival analysis using unbalanced sample

2  Asked on December 29, 2021 by jessi

         

Randomly sample point from a 2D pdf image

1  Asked on December 29, 2021 by c-wang

 

Spline regression with many features in R

1  Asked on December 29, 2021 by user2117258

     

transfer function-noise modelling in R

1  Asked on December 29, 2021 by stucash

         

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved.