Data augmentation for recommendation systems

Data Science Asked on February 8, 2021

I have a user-item matrix that I use to train a denoising autoencoder to predict the top-k items to recommend to the different users.

The idea is to corrupt the matrix by erasing a percentage p of the items that each users bought and train the autoencoder to reconstruct the uncorrupted matrix.

Following the implementation of this paper, I am currently erasing 20% of the bought items.

I was wondering if it is legit to augment the dataset by first erasing the p=20% to create the "noised" matrix and, successively, use for instance p=40% and concatenate the two noised matrices and trin the autoencoder to reconstruct a stack of two uncorrupted matrices.

Is it reasonable or is it just an invitation for overfitting?

autoencoder data augmentation noisification overfitting recommender system

Add your own answers!

Ask a Question

Get help from others!