U-nets : how exaclty is upsampling performed

Question

In U-nets I would like a more straight-forward/detailed explanation in how the upsampling part ("right part of the U") is performed.

I read that it can be done by "transposed convolution layers" aka. "deconv layers". I would like a clear example (possibly a bit simplified) on how that is performed?

e.g. initialization, why/how? Detailed sizes/shapes input output etc?

e.g. from here https://medium.com/@keremturgutlu/semantic-segmentation-u-net-part-1-d8d6f6005066 : 
"In transposed convolutions we have weights that we learn through back-propagation. In papers I’ve come across all of these upsampling methods for various cases and also in practice you may change your architecture and try all of them to see which works best for your own problem. I personally prefer transposed convolutions since we have more control over it but you may go for bilinear interpolation or nearest neighbor for simplicity as well."

I don't understand the stated "ou may go for bilinear interp etc."

why would we choose a fixed interp method over learned transposed conv2D filter?
How can that make sense?

displayname · Answer

The original U-Net paper used nearest neighbor interpolation as far as I know. This is also the default upsampling method in TensorFlow. My own anecdotal advice is to not use transposed convolutions in U-Net. It will only make your CNN slower and won't really increase your F1 score (or whatever metric you're using). I would recommend nearest neighbor interpolation.
Transposed convolution is basically regular convolution with zeros inserted between each input. In U-net after each upsampling, there will be a regular convolution layer. So you don't lose anything if you choose fixed weights like nearest neighbor.
Upsampling outside the neural network domain applies two steps: 1. upsample, 2. lowpass filter. The lowpass filter is here the learnable convolution. In CNN there only seems to be an issue with aliasing (see Making Convolutional Networks Shift-Invariant Again). But this happens also with max pooling / downsampling.

U-nets : how exaclty is upsampling performed

One Answer

Add your own answers!

Ask a Question