I’m trying to implement a soft actor-critic algorithm for financial data (stock prices), but I have trouble with losses: no matter what combination of hyper-parameters I enter, they are not converging, and basically it caused bad reward return as well. It sounds like the agent is not learning at all.
I already tried to tune some hyperparameters (learning rate for each network + number of hidden layers), but I always get similar results.
The two plots below represent the losses of my policy and one of the value functions during the last episode of training.
My question is, would it be related to the data itself (nature of data) or is it something related to the logic of the code?
I would say it is the nature of data. Generally speaking, you are trying to predict a random sequence, especially if you use the history data as an input and try to get the future value as an output.
Correct answer by oleg.mosalov on December 7, 2020
0 Asked on December 18, 2020 by lll
1 Asked on December 15, 2020
1 Asked on December 13, 2020 by k-do
1 Asked on December 9, 2020 by kais-hasan
0 Asked on December 9, 2020 by murugesh
1 Asked on December 7, 2020 by zahra
1 Asked on December 5, 2020
2 Asked on December 3, 2020 by mscott
0 Asked on December 3, 2020 by edwin-carlsson
1 Asked on December 2, 2020 by alena-volkova
3 Asked on November 30, 2020 by mithical
0 Asked on November 30, 2020 by dimer
3 Asked on November 18, 2020 by user91411
0 Asked on November 18, 2020 by basile-starynkevitch
1 Asked on November 7, 2020 by duttaa
0 Asked on October 30, 2020 by kosmos
1 Asked on October 12, 2020 by petsol
1 Asked on September 27, 2020 by ali-khalili
1 Asked on September 23, 2020 by user2946825
Get help from others!