Q-Learning Aided Intelligent Routing with Maximum Utility in Cognitive UAV Swarm for Emergency Communications
preprintposted on 07.04.2022, 03:39 authored by Long ZhangLong Zhang, Xiaozheng Ma, Zirui Zhuang, Haitao XuHaitao Xu, Vishal SharmaVishal Sharma, Zhu Han
In this paper, we attempt to deal with the routing problem in a cognitive unmanned aerial vehicle (UAV) swarm (CU-SWARM), which applies the cognitive radio into a swarm of UAVs within a three-hierarchical aerial-ground integrated network architecture for emergency communications. In particular, the flexibly converged architecture utilizes a UAV swarm and a high-altitude platform to support aerial sensing and access, respectively, over the disaster-affected areas. We develop a Q-learning framework to achieve the intelligent routing with maximum utility for CU-SWARM. To characterize the reward function, we take into account both the routing metric design and the candidate UAV selection optimization. The routing metric is determined by maximizing the utility, which jointly captures the achievable rate of UAV pair and the residual energy of UAV. Besides, under the location, arc, and direction constraints, the circular sector is modeled by properly choosing the central angle and the acceptable signal-to-noise ratio for the UAV. With this setup, we further propose a low-complexity iterative algorithm using the dynamic learning rate to update Q-values during the training process for achieving a fast convergence speed. Extensive simulation results are provided to assess the potential of the Q-learning framework of intelligent routing as well as to verify our overall iterative algorithm via the dynamic learning rate for training procedure. Our findings reveal that the proposed algorithm can significantly increase the accumulated rewards significantly with practical complexity compared to other benchmark schemes with fixed and decaying learning rates.