TransWikia.com

Why do people use tanh more often than ReLU in vanilla recurrent neural networks?

Cross Validated Asked on December 18, 2021

For instance, the default activation function of tf.keras.layers.SimpleRNN is tanh.
My doubt is because tanh activation functions may also cause (like sigmoids) the vanishing gradient problem.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP