Version 1
: Received: 25 July 2024 / Approved: 29 July 2024 / Online: 29 July 2024 (08:23:17 CEST)
How to cite:
Mora Cortes, M. S.; Perdomo Chary, C. A.; Perdomo, O. J. M-Learning: A Computationally Efficient Heuristic for Reinforcement Learning with Delayed Rewards. Preprints2024, 2024072253. https://doi.org/10.20944/preprints202407.2253.v1
Mora Cortes, M. S.; Perdomo Chary, C. A.; Perdomo, O. J. M-Learning: A Computationally Efficient Heuristic for Reinforcement Learning with Delayed Rewards. Preprints 2024, 2024072253. https://doi.org/10.20944/preprints202407.2253.v1
Mora Cortes, M. S.; Perdomo Chary, C. A.; Perdomo, O. J. M-Learning: A Computationally Efficient Heuristic for Reinforcement Learning with Delayed Rewards. Preprints2024, 2024072253. https://doi.org/10.20944/preprints202407.2253.v1
APA Style
Mora Cortes, M. S., Perdomo Chary, C. A., & Perdomo, O. J. (2024). M-Learning: A Computationally Efficient Heuristic for Reinforcement Learning with Delayed Rewards. Preprints. https://doi.org/10.20944/preprints202407.2253.v1
Chicago/Turabian Style
Mora Cortes, M. S., Cesar Andrey Perdomo Chary and Oscar J. Perdomo. 2024 "M-Learning: A Computationally Efficient Heuristic for Reinforcement Learning with Delayed Rewards" Preprints. https://doi.org/10.20944/preprints202407.2253.v1
Abstract
The current design of reinforcement learning methods demands exhaustive computing. Algorithms such as Deep Q-Network achieved outstanding results in the development of the area. However, the need for thousands of parameters and training episodes is still a problem. Thus, this document proposes a comparative analysis of the Q-Learning algorithm (the inception to create Deep Q Learning) and our proposed method termed M-Learning. The comparison among algorithms using Markov decision processes with delayed reward as a general testbench framework. Firstly, a full description of the main problems related to implementing Q-Learning, mainly about its multiple parameters. Then, the foundations of our proposed heuristic with its formulation and the whole algorithm were reported in detail. Finally, the methodology chosen to compare both algorithms was to train the algorithms
in the Frozen Lake environment. The experimental results and an analysis of the best solutions found that our proposed algorithm highlights the differences in the number of episodes necessary and their standard variations. The code will be available on a GitHub repository once the paper is published.
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.