What is meant by "arranging the final features of CNN in a grid" and how to do it?

Artificial Intelligence Asked on December 27, 2021

In the paper What You Get Is What You See: A Visual Markup Decompiler, the authors have proposed a method to extract the features from the CNN and then arrange those extracted features in a grid to pass into an RNN encoder. Here’s an illustration.

I can easily extract features from either the existing model, like ResNet, VGG, or make a new CNN model easily as they have described in the paper.

For example, let us suppose, I do this

features = keras.applications.ResNet()(images_array) # just hypothetical

How can I convert these images to the grid?? I am supposed to feed the output of the changed grid to an LSTM Encoder as:

keras.layers.LSTM()(grid) # again, hypothetical type

I just want to know what the author means from changing the output in the grid format.

computer vision deep learning neural networks papers terminology

Add your own answers!

Ask a Question

Get help from others!