TransWikia.com

What is the purpose of a 1x1 convolutional layer?

Data Science Asked on August 12, 2021

SqueezeNet uses 1×1 convolutions. I try to understand it in this simple example: if the input is one MNIST digit, i.e. of shape 1x28x28x1 (I use Batch x Height x Width x Channel).

Then applying a Conv2D(16, kernel_size=(1,1)) produces an output of size 1x28x28x16 in which I think each channel 1x28x28xi (i in 1..16) is just the multiplication of the input layer by a constant number. Is that right?

More specifically: Output[channel i][x,y] = InputLayer[x,y] * alpha_i for x,y in 1..28, where alpha_i is a constant for each channel.

Is this correct?

It’s like going from 1 channel to 16 identical channels (except that they are multiplied by one global constant per channel).

What is its purpose?


Note: I have already read How are 1×1 convolutions the same as a fully connected layer? and 1×1 Convolution. How does the math work? but here it’s slightly different.

2 Answers

First of all, Conv2D(16, kernel_size=(1,1)) applied to 28x28x1 will produce 28x28x16 as it doesn't change the number of dimensions. Secondly, I took a look at the paper you refer to, and haven't found that they apply 1x1 filters to an input with just one channel as in the example you provided. In case of just one channel in the input your reasoning is correct, but it cannot be transferred to input with several channels.

Conv2D by definition, assign weights between all channels of a selected area (in case of 1x1 kernel to a whole column of nodes) and corresponding column of nodes in the output grid. Then it applies these weights subsequently to all valid areas according to stride and padding parameters. Thus applying 1x1 kernel on 1-channel input doesn't make sense as all output channels will be proportional to each other and to the input channel as well. In case of multiple channels, as per definition Conv2D each node of the output will be a weighted (non-trivial) sum of a corresponding input column and won't be proportional to either channel of the input in most cases.

Answered by Mikhail Berlinkov on August 12, 2021

Useful example found here

1x1 convolution.
As an aside, several papers use 1x1 convolutions, as first investigated by Network in Network. Some people are at first confused to see 1x1 convolutions especially when they come from signal processing background. Normally signals are 2-dimensional so 1x1 convolutions do not make sense (it’s just pointwise scaling). However, in ConvNets this is not the case because one must remember that we operate over 3-dimensional volumes, and that the filters always extend through the full depth of the input volume. For example, if the input is [32x32x3] then doing 1x1 convolutions would effectively be doing 3-dimensional dot products (since the input depth is 3 channels).

Answered by Basj on August 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP