Applying GradCam to video classification models

Data Science Asked by Iván Mindlin on August 13, 2021

In the original paper, it says that GradCam visualization can be applied to any convolution based model. The problem is stated for convolutions that process images. In my case, I am classifying videos so therefore I should apply GradCam to every frame individually by calculating the gradients with the loss of the entire video (At least that’s how I think).

The problem is that I am using different models to experiment, such as a ConvLSTM. These use convolutions in each LSTM gate, and though I return intermediate results of each frame, these are maxpooled when passed to the next layer so I cannot get activations corresponding to each frame.
But I also work with a model that uses MobileNet to feature extract each frame, and pass that to a GRU network. In this case mi approach should work?

I am nothing attaching example code because I believe this to be a theoretical question, but if need be I will.

cnn computer vision rnn visualization

Add your own answers!

Ask a Question

Get help from others!