Can we use decreasing step size to replace mini-batch in SGD?

Question

As far as I know, mini-batch can be used to reduce the variance of the gradient, but I am also considering if we can achieve the same result if we use the decreasing step size and only single sample in each iteration? Can we compare the convergence rate of them?

Abhishek Singla · Answer

Main objective of mini-batch gradient descent is to achieve faster results over full-batch gradient descent as it will start learning weights before completion of one epoch. SGD will start learning earlier than Mini-batch, isn't it? But mini-batch reduces variance of the gradient compared to SGD.
Coming to the question, you're right it's possible to compare the convergence of both scenarios. People used to use SGD with decreasing step-size until Mini-batch algorithm came. Because in practice, Mini-batch gives better performance over SGD due to it's vectorisation property. This property helps in making the computation faster with comparable results wrt SGD.

mirror2image · Answer

Generally answer is "it's not known". Similarity of effects of increasing minibatches size and decreasing learning rate is mostly empirical, there is no known asymptotic formula for it. Also effect of small LR and big minibatch is not the same. For example batch normalization layer would act completely different on those two approaches. Probabilistic distribution of gradients produced by minibatches and single sample (or mb of significantly different size) would be also quite different

Answered by mirror2image on December 14, 2021

Can we use decreasing step size to replace mini-batch in SGD?

2 Answers

Add your own answers!

Ask a Question