MLP conv layers

Question

When should MLP conv layers be used instead of normal conv layers? Is there a consensus? Or is it the norm to try both and see which one performs better? I would love to better understand the differences between the two.  Also which deep learning libraries support MLP conv layer?  They are used in this paper Network in Network

Tsuman · Answer

In normal convolutional layers, ReLU activation function is used. ReLU is fixed and cannot be trained in itself. MLP Conv Layer is combination of Convolution operation and Multilayer Perceptron Network. If you use MLP after Convolution, you can learn much more complex function(mapping input to output).

It increases capacity of your model as well, and can fit more data.

For comparison you have to try it yourself. Some comparison is shown in the paper itself.

There is also another paper that does a similar thing. Check: Learning Activation Functions to Improve Deep Neural Networks

You can make your own MLP conv layer in deep learning frameworks like Tensorflow and Pytorch. Someone might have already built it as well.

ncasas · Answer

The term normally used to refer to "MLP conv layers" nowadays is 1x1 convolutions.
1x1 convolutions are normal convolutions, but their kernel size is 1, that is they only act on one position (i.e. one pixel for images, one token for discrete data). This way, 1x1 convolutions are equivalent to applying a dense layer position-wise. The term "MLP convolutional layers" used in the network-in-network paper is a reference to this fact.
While normal convolutions use the spatial information and therefore they can detect local patterns (spatial locality inductive bias), 1x1 convolutions do not, as their window of action is a single position. They are simply used to change the dimensionality of representations, specifically to change the number of channels in images, or to change the embedding dimensionality in discrete data. For instance, if at some point of a 2D convolutional network we have a tensor of width $w$, height $h$ and $c$ channels, we can use a 1x1 convolution to obtain a tensor of width $w$, height $h$ and $c'$ channels, where $c neq c'$.

MLP conv layers

2 Answers

Add your own answers!

Ask a Question