How can I have the same input and output shape in an auto-encoder?

Question

I'm building a denoising autoencoder. I want to have the same input and output shape image.
This is my architecture:
input_img = Input(shape=(IMG_HEIGHT, IMG_WIDTH, 1))

x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(32, (3, 3), activation='relu', padding='valid')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

# decodedSize = K.int_shape(decoded)[1:]

# x_size = K.int_shape(input_img)
# decoded = Reshape(decodedSize, input_shape=decodedSize)(decoded)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

My input shape is: 1169x827
This is Keras output:
Model: "model_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         [(None, 1169, 827, 1)]    0         
_________________________________________________________________
conv2d_30 (Conv2D)           (None, 1169, 827, 32)     320       
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 585, 414, 32)      0         
_________________________________________________________________
conv2d_31 (Conv2D)           (None, 585, 414, 64)      18496     
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 293, 207, 64)      0         
_________________________________________________________________
conv2d_32 (Conv2D)           (None, 291, 205, 32)      18464     
_________________________________________________________________
up_sampling2d_12 (UpSampling (None, 582, 410, 32)      0         
_________________________________________________________________
conv2d_33 (Conv2D)           (None, 582, 410, 32)      9248      
_________________________________________________________________
up_sampling2d_13 (UpSampling (None, 1164, 820, 32)     0         
_________________________________________________________________
conv2d_34 (Conv2D)           (None, 1162, 818, 1)      289       
===============================================================

How can I have the same input and output shape?

Vesko Vujovic · Answer

I don't know if this is the right way of doing it but I solved the problem.
Following the code from above I've added:
img_size = K.int_shape(input_img)[1:]

resized_image_tensor = tf.image.resize(decoded, list(img_size[:2]))****

autoencoder = Model(input_img, resized_image_tensor)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

I used tf.image.resize to synchronize the shape of reconstructed image and input image.
Hope it helps.

Hans-Martin Mosner · Answer

If you look at Keras' output, there are various steps which lose pixels:
Max pooling on odd sizes will always lose one pixel. Conv2D using 3x3 kernels will also lose 2pixels, although I'm puzzled that it doesn't seem to happen in the downsampling steps.
Intuitively, padding the original images with enough border pixels to compensate for the pixel loss due to the various layers would be the simplest solution. At the moment I can't calculate how much it should be, but I suspect rounding up to a multiple of 4 should take care of the max pooling layers. For denoising, borders could be just copied from the outermost pixels, probably with some sort of low pass filtering to avoid artefacts.

How can I have the same input and output shape in an auto-encoder?

2 Answers

Add your own answers!

Ask a Question