Why do people use tanh more often than ReLU in vanilla recurrent neural networks?

Cross Validated Asked on December 18, 2021

For instance, the default activation function of tf.keras.layers.SimpleRNN is tanh.
My doubt is because tanh activation functions may also cause (like sigmoids) the vanishing gradient problem.

neural networks rnn

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Joshua Engel on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?