Preprint Article Version 1 This version is not peer-reviewed

DQN-Based Shaped Reward Function Mould for UAV Emergency Communication

Version 1 : Received: 13 August 2024 / Approved: 13 August 2024 / Online: 14 August 2024 (09:37:07 CEST)

How to cite: Ye, C.; Zhu, W.; Guo, S.; Bai, J. DQN-Based Shaped Reward Function Mould for UAV Emergency Communication. Preprints 2024, 2024080979. https://doi.org/10.20944/preprints202408.0979.v1 Ye, C.; Zhu, W.; Guo, S.; Bai, J. DQN-Based Shaped Reward Function Mould for UAV Emergency Communication. Preprints 2024, 2024080979. https://doi.org/10.20944/preprints202408.0979.v1

Abstract

Unmanned aerial vehicle UAV has become an important tool for emergency communication. When disaster happens, the roads in the disaster area always are seriously damaged, and theground infrastructure communication facilities are damaged. Under emergency conditions, UAV can go into key areas as communication nodes to provide communication services for users in the area.But it's difficult for UAV to cover the entire disaster area and the efficient rescue time will be delayed. Therefore, how to make the UAV accurately perceive the regional situation and deployed in the right position is worth studying. In this paper, a virtual simulation environment is established reasonably and deep reinforcement learning algorithm is used to train UAV agents. At the same time, a series of problems such as sparse reward and long training time are still not solved in the development of reinforcement learning algorithms,and the reward function is designed to improve training efficiency. First of all, we set a specific mountain emergency communication scenario, and combined with the specific application of the UAV to carry out virtual simulation, and build a virtual environment. Furthermore, an additional shaped reward function is designed to address the sparse reward problem. Through the improvement of deep Q-learning network DQN algorithm and the reward design based on potential function, the final evaluation index is improved, and the effectiveness of the algorithm is verified. The experimental results demonstrate our work can effectively shorten the training time and increase convergencerate.

Keywords

unmanned aerial vehicle UAV; deep Q-learning network DQN; reward shaping

Subject

Engineering, Control and Systems Engineering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.