loading page

Deep Reinforcement Learning for 5G Radio Access Network Slicing with Spectrum Coexistence
  • +2
  • Yi Shi ,
  • Parisa Rahimzadeh ,
  • Maice Costa ,
  • Tugba Erpek ,
  • Yalin E. Sagduyu
Yi Shi
Virginia Tech

Corresponding Author:[email protected]

Author Profile
Parisa Rahimzadeh
Author Profile
Maice Costa
Author Profile
Tugba Erpek
Author Profile
Yalin E. Sagduyu
Author Profile

Abstract

The paper presents a reinforcement learning solution to dynamic admission control and resource allocation for 5G radio access network (RAN) slicing requests, when the spectrum is potentially shared between 5G and an incumbent user such as in the Citizens Broadband Radio Service scenarios. Available communication resources (frequency-time resource blocks and transmit powers) and computational resources (processor power) not used by the incumbent user can be allocated to stochastic arrivals of network slicing requests. Each request arrives with priority (weight), throughput, computational resource, and latency (deadline) requirements. As online algorithms, the greedy and myopic solutions that do not consider heterogeneity of future requests and their arrival process become ineffective for network slicing. Therefore, reinforcement learning solutions (Q-learning and Deep Q-learning) are presented to maximize the network utility in terms of the total weight of granted network slicing requests over a time horizon, subject to communication and computational constraints. Results show that reinforcement learning provides improvements in the 5G network utility relative to myopic, greedy, random, and first come first served solutions. In particular, deep Q-learning reduces the complexity and allows practical implementation as the state-action space grows, and effectively admits/rejects requests when 5G needs to share the spectrum with incumbent users that may dynamically occupy some of the frequency-time blocks. Furthermore, the robustness of deep reinforcement learning is demonstrated in the presence of the misdetection/false alarm errors in detecting the incumbent user’s activity.