Development and validation of a safe reinforcement learning drone controller
DOI:
https://doi.org/10.17979/ja-cea.2025.46.12154Keywords:
Control mediante aprendizaje por refuerzo, UAVs, Vehículos autónomos, Aprendizaje y adaptaci´ón en vehículos autónomosAbstract
This paper presents the work in progress that aims to develop, validate, and verify the use of Reinforcement Learning~(RL) and neural networks for safety real-time systems. A \gls{UAV} controller is utilized as a use case. These techniques are particularly promising for autonomous control as they learn from a dynamic environment without human intervention. The proposed solution shows the behavior of a UAV that has been trained to maintain altitude and avoid incoming obstacles such as other UAVs. The AirSim simulator was used to simulate a realistic flight scenario with two UAVs. One of the vehicles contains the RL controller, while the other is constantly attempting to collide with it. Preliminary results show that neural networks trained with the Soft Actor-Critic (SAC) algorithm can avoid collisions, especially in cases unseen during training. Despite the satisfactory results, further research is needed to verify that the RL agent can operate correctly in a safe and real-time environment.
References
Boysen, N., Fedtke, S., Schwerdfeger, S., Mar. 2021. Last-mile delivery concepts: a survey from an operational research perspective. OR Spectrum 43 (1), 1–58. URL: https://doi.org/10.1007/s00291-020-00607-8 DOI: 10.1007/s00291-020-00607-8
Caballero-Martin, D., Lopez-Guede, J. M., Estevez, J., Graña, M., 2024. Artificial intelligence applied to drone control: A state of the art. Drones 8 (7). URL: https://www.mdpi.com/2504-446X/8/7/296 DOI: 10.3390/drones8070296
DJI, 2023. Dji inspire 3 - specs. Access: 2025-05-14. URL: https://www.dji.com/inspire-3/specs
EUROCAE, 2012. Ed-12c/do-178c: Software considerations in airborne systems and equipment certification. Access: 2025-05-14. URL: https://www.eurocae.net/ed-12c/
Gordo, V., Perez-Castan, J. A., Perez Sanz, L., Serrano-Mira, L., Xu, Y., 2024. Feasibility of conflict prediction of drone trajectories by means of machine learning techniques. Aerospace 11 (12). URL: https://www.mdpi.com/2226-4310/11/12/1044 DOI: 10.3390/aerospace11121044
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S., 2018. Soft actor-critic: Off policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR abs/1801.01290. URL: http://arxiv.org/abs/1801.01290 DOI: 10.48550/arXiv.1801.01290
Karthik, P., Kumar, K., Fernandes, V., Arya, K., 2020. Reinforcement learning for altitude hold and path planning in a quadcopter. In: 2020 6th International Conference on Control, Automation and Robotics (ICCAR). pp. 463–467. DOI: 10.1109/ICCAR49639.2020.9108104
Muñoz, G., Barrado, C., C¸ etin, E., Salami, E., 2019. Deep reinforcement learning for drone delivery. Drones 3 (3). URL: https://www.mdpi.com/2504-446X/3/3/72 DOI: 10.3390/drones3030072
Rierson, L., Jan. 2013. Developing safety-critical software. CRC Press, Boca Raton, FL.
Tu, G.-T., Juang, J.-G., 2023. Uav path planning and obstacle avoidance based on reinforcement learning in 3d environments. Actuators 12 (2). URL: https://www.mdpi.com/2076-0825/12/2/57 DOI: 10.3390/act12020057
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ángel-Grover Pérez-Muñoz, Guillermo López-García, Hernán García-Quijano, Alejandro Alonso

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.