loading page

Multi-agent Reinforcement Learning-based Capacity Planning for On-demand Vehicular Fog Computing
  • +4
  • Wencan Mao ,
  • Jiaming Yin ,
  • Yushan Liu ,
  • Byungjin Cho ,
  • Yang Chen ,
  • Weixiong Rao ,
  • Yu Xiao
Wencan Mao
Aalto University, Aalto University

Corresponding Author:[email protected]

Author Profile
Jiaming Yin
Author Profile
Yushan Liu
Author Profile
Byungjin Cho
Author Profile
Yang Chen
Author Profile
Weixiong Rao
Author Profile


Fog computing reduces network latency by moving computational resources close to where the data is generated. Vehicular fog computing (VFC) is an emerging computing paradigm where fog nodes deployed on moving vehicles (i.e., vehicular fog nodes (VFNs)) complement stationary fog nodes (e.g., the ones co-located with cellular base stations) to satisfy the spatiotemporally varying demand for computing resources in a cost-efficient manner. On-demand VFC (ODVFC) supports dynamic routing of VFNs, with the aim of fulfilling the spatiotemporally varying demand for computational resources in a cost-efficient manner. Different from previous works on capacity planning and vehicle routing that utilize compute-intensive optimization methods such as integer linear programming (ILP), this paper explores the feasibility of applying reinforcement learning to dynamic capacity planning in a time-efficient manner. Specifically, we propose to apply multi-agent reinforcement learning (MARL) with actor-critic methods to train the VFN routing policies. This approach allows distributed VFNs to cooperatively maximize the techno-economic performance of ODVFC. For evaluation, we built an open-source VFC simulation platform that integrates vehicular traffic simulation with 5G NR V2X and MARL environment. Compared with decentralized learning (i.e., each VFN independently learns its routing policy), centralized learning (i.e., using a global agent for VFN routing), and ILP methods, our proposal proves to achieve 8.3% higher revenue and 13.2% higher number of served tasks than decentralized training; and it has 40.6% and 83% lower execution time than centralized learning and ILP, respectively, with only 14% lower revenue than both. It is also scalable to real-life scenarios with a great number of users and VFNs.