Q-Learning Aided Intelligent Routing with Maximum Utility in Cognitive
UAV Swarm for Emergency Communications
Abstract
In this paper, we attempt to deal with the routing problem in a
cognitive unmanned aerial vehicle (UAV) swarm (CU-SWARM), which applies
the cognitive radio into a swarm of UAVs within a three-hierarchical
aerial-ground integrated network architecture for emergency
communications. In particular, the flexibly converged architecture
utilizes a UAV swarm and a high-altitude platform to support aerial
sensing and access, respectively, over the disaster-affected areas. We
develop a Q-learning framework to achieve the intelligent routing with
maximum utility for CU-SWARM. To characterize the reward function, we
take into account both the routing metric design and the candidate UAV
selection optimization. The routing metric is determined by maximizing
the utility, which jointly captures the achievable rate of UAV pair and
the residual energy of UAV. Besides, under the location, arc, and
direction constraints, the circular sector is modeled by properly
choosing the central angle and the acceptable signal-to-noise ratio for
the UAV. With this setup, we further propose a low-complexity iterative
algorithm using the dynamic learning rate to update Q-values during the
training process for achieving a fast convergence speed. Extensive
simulation results are provided to assess the potential of the
Q-learning framework of intelligent routing as well as to verify our
overall iterative algorithm via the dynamic learning rate for training
procedure. Our findings reveal that the proposed algorithm can
significantly increase the accumulated rewards significantly with
practical complexity compared to other benchmark schemes with fixed and
decaying learning rates.