Resource Allocation for Joint Interference Management and Security Enhancement in Cellular-Connected Internet-of-Drones Networks

Internet-of-drones (IoD) systems require enhanced data transmission security and efficient interference management to accommodate the rapidly growing drone-based and rate-intensive applications. This paper develops a novel resource allocation scheme to jointly manage interference and enhance the physical layer security of cellular-connected IoD networks in the presence of a multi-band eavesdropping drone. Our envisioned cellular-connected IoD network has multiple full-duplex cellular base stations (CBSs), where each CBS reserves an orthogonal cellular radio resource block (RRB) for the aerial communication links. To efficiently utilize the cellular RRBs, each CBS is connected to a cluster of data transmitting drones using uplink non-orthogonal multiple access (NOMA) scheme. In addition, all the CBSs simultaneously transmit artificial noise signals to weaken the eavesdropper links. A joint optimization problem, considering the transmit power allocation and clustering of the legitimate drones, and the jamming power allocation of the CBSs, is formulated to maximize the worst-case average sum-secrecy-rate of the network. The joint optimization problem is decomposed into drone-clustering and power allocation sub-problems to obtain an efficient solution. A multi-agent reinforcement-learning framework is devised to solve the drone-clustering sub-problem. Meanwhile, the transmit and jamming power allocation sub-problem is solved by employing fractional programming, successive convex approximation, and alternating optimization techniques. By iteratively solving these two sub-problems, a convergent resource allocation algorithm, namely, <inline-formula><tex-math notation="LaTeX">$\underline{\text{s}}$</tex-math></inline-formula>ecurity and <inline-formula><tex-math notation="LaTeX">$\underline{\text{i}}$</tex-math></inline-formula>nterference management with <inline-formula><tex-math notation="LaTeX">$\underline{\text{re}}$</tex-math></inline-formula>inforcement-learning and <inline-formula><tex-math notation="LaTeX">$\underline{\text{N}}$</tex-math></inline-formula>OMA (SIREN), is proposed. The superiority of SIREN over several benchmark schemes is verified via extensive simulations.

complex encryption keys. As a result, PLS requires much less complexity than cryptographic methods, which makes PLS attractive for aerial communications [10]. Besides, a multi-layer security mechanism can be constructed by augmenting PLS with upper layer's existing security protocols. The key performance metric of PLS is the secrecy-rate (SR), which quantifies an effective transmission rate of the confidential message without any information leakage at the eavesdropping nodes. A positive SR is guaranteed as long as the legitimate link has a higher channel capacity than the eavesdropping link. Hence, the SR can be enhanced by appropriately designing the resource allocations. The existing literature optimized several factors to enhance the SR of aerial communication systems, including the drones' transmit power, the trajectory followed by the drones, and the sub-channel assignment between the drones and the ground users [11]- [13].
The SR of aerial communications can be further enhanced using cooperative jamming, where artificial noise (AN) is simultaneously transmitted with the confidential messages. The purpose of AN is to artificially increase interference at the eavesdroppers without affecting the legitimate links. In [14], dual-drone based cooperative jamming, where one drone transmits confidential messages and another drone simultaneously transmits AN signals to reduce the eavesdropping capability of the ground adversaries, was developed. In [15], a power-splitting method was developed to split the transmit power of the legitimate drone between the confidential messages and AN signals. The studies in [11], [12], [14], [15] assumed accurate estimation of the eavesdroppers' locations. In [16]- [18], resource allocation schemes were developed to maximize the worst-case SR while considering bounded location error models for the eavesdroppers. In [19], a probabilistic location error model for the eavesdroppers was utilized to maximize the outage-constrained average SR via robust optimization of the drones' trajectory, and the transmit and jamming powers. Nonetheless, most of the these studies ignored the interference by scheduling the available drone-to-ground links over orthogonal channels. Therefore, despite improving the PLS of drone networks, these studies fail to efficiently utilize the radio resources.
3) Interference Management With PLS: Non-orthogonal multiple access (NOMA) is a powerful technique that can enhance the spectral efficiency of multi-user networks [20]. In NOMA, signals from multiple users are multiplexed over the same RRB using different transmit power levels at the transmitter and decoded at the receiving node using successive interference cancellation (SIC). It is noted that the data transmission process in a cellular-connected IoD network is an uplink multiple access scheme. Hence, NOMA is promising to concurrently schedule multiple drones to transmit data over the same cellular RRB. By efficiently exploiting interference in the network, NOMA improves the achievable SR as well [21]. Recently, NOMA-enabled secured aerial communications has received considerable attention. In [22], NOMA was exploited to transmit the secured and public users' messages from an aerial BS such that the secured message cannot be eavesdropped by public users. In [23], NOMA was studied to efficiently allocate the transmit power of an aerial BS between a pair of security-demanding and rate-demanding ground users without leakage of secured information. In [24], both NOMA and cooperative jamming were integrated to enhance the PLS of an aerial communication system. However, the studies in [22]- [24] considered single drone-based communications. We emphasize that the existing literature lacks works on the joint management of interference and PLS for multi-drone cellular-connected IoD networks considering both inter-drone and drone-to-ground interference into account.

B. Motivation, Applications, and Contributions
1) Motivation: The aim of this work is to develop a resource allocation scheme for cellular-connected IoD networks. In particular, we envision an IoD network in which multiple legitimate drones concurrently transmit confidential data to a set of full-duplex (FD) CBSs using cellular RRBs and powerdomain NOMA in the presence of a multi-band eavesdropping drone. The motivation for using NOMA and FD technology is explained as follows. We specifically consider a dense IoD network in which there are more legitimate drones than available cellular RRBs. To efficiently utilize the cellular RRBs, a droneclustering scheme is proposed so that the drones within a cluster can concurrently transmit over the same RRB. Nevertheless, drone-clustering inevitably introduces co-channel inter-drone interference that reduces the capacity of the legitimate data transmission links. To combat this interference, power-domain NOMA is employed in the drone-clusters. Meanwhile, cooperative jamming is used to make the drone-to-ground communications resilient to eavesdropping. We take inspiration from [16], [17] and implement the cooperative jamming by incorporating FD technology at the CBSs so that the CBSs can simultaneously receive data from legitimate drones and transmit AN signals to weaken eavesdropping links. Note that FD technology can be readily deployed at the CBSs by combining analog and digital self-interference (SI) cancellation techniques [25]. To the best of our knowledge, this is the first work that jointly employs drone-clustering, NOMA-enabled data transmission, and FD CBS-aided cooperative jamming to improve both the capacity and the security of multi-drone cellular-connected IoD networks.
2) Applications: IoD networks are promising for both civilian and military surveillance and monitoring applications [26]. In these types of applications, drones usually generate highdefinition videos and images that are processed at the ground control station. As a result, these applications require high capacity drone-to-ground communications and beyond-visualline-of-sight (BVLOS) drone operations for extended coverage. Existing point-to-point drone-to-ground communications rely mainly on the unlicensed 2.4 GHz industrial, scientific, and medical (ISM) frequency band, which supports only visual-lineof-sight drone operations and low data rate. In contrast, cellular networks provide reliable command and control links for drone operations in the BVLOS range, a large bandwidth, and extended coverage [27]. Thus, unlike the ISM band, cellular networks are well positioned to support a variety of IoD network applications. However, the cellular-connected IoD networks must be resilient to untrusted drones in the airspace, especially for securitycritical applications. In this context, our proposed system model not only enhances the capacity of cellular-connected IoD networks by efficiently managing co-channel interference but also prevents unauthorized drones from eavesdropping on confidential information. Our proposed system model can therefore be used in various IoD network applications, namely, boarder surveillance, military search and rescue operations, and critical infrastructure monitoring that require both high-capacity and secured drone-to-ground data transmissions [26].
3) Contributions: In this work, a novel resource allocation scheme is proposed to enhance the overall SR of the cellularconnected IoD networks and efficiently control both co-channel inter-drone interference and drone-to-CBS interference. The specific contributions of this work are summarized as follows.
1) The worst-case average sum-SR (WC-ASR) of the network is maximized subject to target SR constraints for the legitimate drones and maximum acceptable interference constraints at the CBSs. Towards this objective, an optimization problem is formulated by jointly considering the transmit power allocation and clustering of the legitimate drones, and the jamming power allocation of the CBSs. The presented joint optimization problem is NP-hard, making it inherently difficult to find a global optimal solution. A two-level optimization approach is proposed to design a low-complexity resource allocation scheme. More specifically, the joint optimization problem is decomposed into two sub-problems, i.e., drone-clustering and power allocation sub-problems. A sub-optimal yet efficient solution is obtained by iteratively solving these two sub-problems.
2) The drone-clustering sub-problem is formulated as a mixed-strategy repeated game and solved using a multiagent reinforcement learning (RL) framework. Meanwhile, the transmit and jamming power allocation subproblem is solved by employing fractional programming, successive convex approximation (SCA), and alternating optimization techniques. Near-optimality of the derived solutions to both sub-problems is proved. 3) A resource allocation algorithm of polynomial computational complexity, namely, security and interference management with reinforcement-learning and NOMA (SIREN), is proposed. SIREN converges to a local-optimal solution of the presented joint optimization problem with a polynomial computational complexity. Through extensive simulations, the superiority of SIREN over several interference-aware power allocation, drone-clustering, and RL empowered methods is demonstrated. The rest of this paper is organized as follows. In Section II, the system model and problem formulation are presented. The sub-problems' solutions are provided in Sections III and IV. Section V presents an overview of the properties of the SIREN algorithm. Finally, simulation results and concluding remarks are provided in Sections VI and VII, respectively.

II. SYSTEM MODEL AND PROBLEM FORMULATION
A. System Model 1) System Overview: We consider a cellular-connected IoD network, as shown in Fig. 1, with M legitimate drones, K FD CBSs, and a multi-band eavesdropping drone that passively overhears the data transmitted by the legitimate drones over different RRBs. The CBSs are connected to a centralized network controller, that coordinates the overall resource allocation procedure. The geographical region is divided into M different sensing zones, each of which is assigned to a single drone. In each sensing zone, the assigned drone sequentially visits N stop-over points before returning to its initial location [5]. To facilitate this, the mission duration is divided into N equal timeslots (TSs). In the n-th TS, where n ∈ {1, 2, . . . , N}, the drones fly from the (n − 1)-th stop-over point to the n-th stop-over point in their assigned sensing zones, collect data (e.g., images and videos) while statically hovering at the n-th stop-over point, and send the collected data to the CBSs using cellular RRBs. We emphasize that the stop-over points and the transition routes between the consecutive stop-over points need to be carefully Fig. 1. Cellular-connected IoD network with two FD CBSs, two droneclusters, two RRBs, and one eavesdropping drone (drone-to-CBS interference and SI links are not shown for clarity). designed to avoid drones colliding. Note that the proposed framework jointly optimizes the drone-to-CBS associations as well as the drones' transmit and CBSs' jamming power allocations at each TS. As a result, it is computationally intractable to optimize the drones' stop-over points and transition routes in the existing framework. Meanwhile, our proposed optimization framework requires only the drones' stop-over points. Therefore, for the analytical tractability, we consider that both the drones' stop-over points and their transition routes in the sensing zones are predefined to avoid nearby drones colliding. Collision-free path optimization frameworks for a system of multiple drones are extensively investigated in the literature [28], [29] and can be leveraged to design the drones' transition routes of our system model. We assume that the position of the eavesdropping drone can also differ from a TS to another.
Let M = {1, 2, . . . , M} be the set of legitimate drones; K = {1, 2, . . . , K} be the set of CBSs; q m,n = {x m,n , y m,n } be the 2D location of the n-th stop-over point of the m-th drone; ν (n) e be the 2D location of the eavesdropping drone at the n-th TS; x k = {x k , y k } be the 2D coordinate of the k-th CBS; H b be the height of each CBS; H d and H e be the fixed altitudes of the legitimate and eavesdropping drones, respectively. Both legitimate and eavesdropping drones are equipped with a single omnidirectional antenna, whereas each CBS is equipped with two antennas to support FD transmission. Each CBS employs an orthogonal cellular RRB for collecting data from the associated drones. Such a cellular RRB is also reused in a set of predefined CBSs for uplink cellular communications, leading to an enhanced spectrum resource utilization [8].
2) Interference Management: The system model considered experiences both co-channel inter-drone and drone-to-ground interference. A hybrid multiple access scheme is adopted to mitigate the co-channel inter-drone interference. In particular, the data-transmitting drones are partitioned into a maximum of K non-overlapping drone-clusters. Each drone-cluster is associated with only one CBS, and each CBS supports only one drone-cluster. Therefore, different drone-clusters utilize orthogonal RRBs for data transmission, and as a result, the inter-cluster interference is eliminated. To overcome the intracluster interference, uplink NOMA with suitable transmit power allocation is employed in each drone-cluster. To further alleviate the intra-cluster interference and utilize the A2G channel's diversity, dynamic drone-clustering is introduced by allowing a given drone to be associated with different CBSs at different stop-over points.
Because the same RRB is reused at multiple CBSs for uplink aerial and cellular communications, the drones associated with a particular CBS can generate severe co-channel interference at neighboring CBSs [8]. Uplink power control is an effective methodology to prevent drones from generating harmful interference [27]. Let us consider that the RRB reserved for aerial communications in the k-th CBS is reused for uplink cellular communications by a set of CBSs, denoted by L k . Essentially, the drones associated with the k-th CBS interfere with all the CBSs of the set L k . Hence, the transmit power of the drones associated with the k-th CBS is controlled so that the resultant co-channel interference at each CBS in the set L k remains below an acceptable interference threshold.
3) Cooperative Jamming: To enhance the PLS, cooperative jamming is employed. In particular, leveraging its FD communication capability, each CBS simultaneously receives data from the associated drones and transmits AN signals over the RRB reserved for aerial communications. The transmitted AN signals reduce the channel capacity of the eavesdropping links at the cost of SI at the CBSs. We consider the CBSs to be equipped with state-of-the-art SI cancellation schemes [25] and thus affected only by residual SI. In particular, both the transmitted AN signal and distortions in the RF chain contribute to SI. RF distortions cannot be canceled out due to their stochastic behavior and equate to the residual SI.

4) Assumptions A1:
The eavesdropping drone's position is imperfectly known at the network controller [16]- [18]. Letν e || ≤ χ}, where χ is the maximum estimation error and || · || is the Euclidean norm. A2: The parameters of the A2G, ground-to-air (G2A), and air-to-air (A2A) channel models are accurately available at the network controller. A3: Due to shadowing and path loss, the uplink cellular users create negligible interference at the neighboring CBSs, and hence interference from the cellular users to the aerial links is not considered.
Remark 1: In the system considered, the eavesdropping drone can position itself in the vicinity of a legitimate drone. In the event that this happens and the legitimate drone is far from its associated CBS, the jamming signal might be weak and the eavesdropping drone might have a higher probability of intercepting the signal transmitted by a legitimate drone. This situation can be effectively tackled by exploiting co-channel inter-drone interference so that the deleterious impact of co-channel interference is minimized at the CBS and maximized at the eavesdropping drone. Note that the co-channel inter-drone interference of the proposed system jointly depends on the drone-clustering and transmit power allocation variables. Therefore, by optimizing the drone-clusters and the transmit power of the clustered drones, the SR can be improved even when the eavesdropping drone is far from the CBS and receives a weak jamming signal from it. To further enhance the SR, the collaboration among the drones and CBSs can be leveraged. In particular, the transmit power of each drone in a cluster can be optimally split between the confidential messages and AN signals such that the received jamming power at the eavesdropping drone is higher even when the eavesdropping drone is far from the CBS. In addition, multiple CBSs in the network can collaboratively transmit the weighted AN signals such that these AN signals coherently combine at the eavesdropping drone and maximize the received jamming power. These collaboration-based strategies for counteracting eavesdropping attacks are beyond the scope of this work and will be investigated in a future work.

B. Channel Model and Secrecy-Rate Expression
1) Channel Model: The A2G, G2A, and A2A communication channels are usually dominated by the LOS links [7], [8]. The A2G channel gain between the m-th legitimate drone and k-th CBS in the n-th TS follows the free space path loss model, given by, h where β 1 denotes the channel gain at the reference distance of 1 m. Similarly, the G2A channel gain between the k-th CBS and the eavesdropping drone in the n-th TS is expressed as f The A2A channel gain between the m-th legitimate and the eavesdropping drones in the n-th TS is expressed as g where β 2 denotes the channel gain at the reference distance of 1 m.

2) Secrecy-Rate Expression:
In what follows, we provide the SR expression of the m-th legitimate drone in the n-th TS. Without loss of generality, we assume that in the n-th TS, the m-th legitimate drone is associated with the k-th CBS. Moreover, the set of drones associated with the k-th CBS in the n-th TS is denoted as S and m,e s m + Q k is the jamming power of the k-th CBS in the n-th TS, ∈ (0, 1) is SI cancellation coefficient, z AN ∼ CN (0, 1) is the normalized AN signal, and n a ∼ CN (0, σ 2 ) is the additive white Gaussian noise.
Without loss of generality, the channel gains of the drones associated with the k-th CBS are sorted as h . The CBS applies SIC to decode the received signals following the decreasing order of the drones' channel gains. However, our presented optimization framework can be applied following other decoding orders as well. Moreover, we consider a practical SIC where each drone's signal is subjected to residual interference from the previously decoded signals. Based on [30, eq. (3)], the channel capacity of the data transmission link between the m-th legitimate drone and k-th CBS in the n-th TS is expressed as where W is the bandwidth of the RRB reserved for aerial communications; is the SIC error coefficient, where c = 0 and c = 1 denote the ideal SIC and a scenario without any SIC, respectively.
In practice, it is more challenging to implement SIC-based decoding in a drone compared to a CBS. Similar to [22], we consider that the eavesdropping drone attempts to decode the transmitted signals of each drone-cluster by treating the co-channel inter-drone interference as a noise. The channel capacity of the eavesdropping link between the m-th legitimate and eavesdropping drones in the n-th TS is expressed as (4) The achievable SR of the m-th legitimate drone in the n-th TS is obtained as where [x] + = max(x, 0). Due to the uncertainty about the eavesdropping drone's location, it is challenging to compute (5).
To address this issue, we derive a tractable lower bound for (5).
In particular, we consider the eavesdropping drone's location that results in the worst-case lower-bound of the achievable SR. Leveraging [16, eq. (5)] and [18, eq. (20)], we obtain an upper bound of the eavesdropping link's channel capacity as and whereg Therefore, the worst-case SR of the m-th legitimate drone in the n-th TS is expressed as Since (7) is analytically tractable, the ensuing problem formulation and analysis are developed using (7) instead of (5).

C. Problem Formulation
A drone-cluster is a set of the drones associated with the same CBS in a given TS. We denote The optimization problem for maximizing the WC-ASR of the network is formulated as P0, shown at the bottom of this page in (8). In P0, P max denotes the maximum instantaneous transmit power of each legitimate drone; constraint C1 ensures that the average SR of each legitimate drone is greater than or equal to a target SR, R s ; constraint C2 provides a maximum average transmit power limit, P avg , for each legitimate drone; constraint C3 provides a maximum jamming power limit, Q max , for each CBS; constraint C4 mandates that the maximum interference caused by the drone-clusters to the neighboring CBS(s) is not more than an acceptable interference level, I th ; constraint C5 ensures that the drone-clusters are non-overlapping.
P0 is a mixed-integer non-linear optimization problem. More specifically, the special instances of P0, which are obtained by fixing either the power allocation or drone-clustering variables, are NP-complete. Consequently, P0 is NP-hard and a global optimal solution to P0 is computationally intractable. To address this issue, we propose a two-level iterative optimization approach. In particular, we decompose P0 into the upper-level and lower-level sub-problems as follows.
Upper-level sub-problem: The upper-level sub-problem optimizes the drone-clusters in each TS. The essence of droneclustering is to associate each data transmitting drone with a suitable CBS so that the sum-SR of the network is maximized. Based on such a fact, the upper-level sub-problem for the n-th TS, ∀n = 1, 2, . . . , N, is formulated as Lower-level sub-problem: For a given set of drone-clusters, the lower-level sub-problem jointly optimizes the transmit power of the drones and the jamming power of the CBSs using the following optimization problem.
A converged sub-optimal solution to P0 is obtained by iteratively solving P1 and P2. We first solve P2 in Section III for any given set of drone-clusters. Thereafter, by leveraging the solution of P2, P1 is solved in Section IV. Finally, the overall solution to P0 is presented in Section V.

A. Proposed Solution Approach
Although P2 is non-convex, it can be solved in dual-domain without a notable loss of optimality, especially for large numbers of TSs. We therefore develop a dual-domain solution to P2. We first express the (partial) Lagrangian function of P2 as (11), shown at the bottom of the page.
In (11), λ, ρ, and μ are the non-negative Lagrangian multipliers for the constraints C1 C2, and C4, respectively where G(λ, ρ.μ) is the Lagrangian dual function, defined as An optimal solution to (13) is required to solve the dual problem of P2. Evidently, (13) is separable per TS and decomposed into N optimization problems. The optimization problem for the n-th TS, where n ∈ {1, 2, . . . , N}, is expressed as The optimal solution to P4 only depends on the drone-clusters in the current TS. Therefore, P2 can be optimally solved by repeating the following two steps in each TS: (i) solving P4 optimally for the given Lagrangian multipliers and (ii) updating the Lagrangian multipliers by minimizing the function, G(λ, ρ.μ) for the updated solution to P4.

B. Solution to P4 for the Given Lagrangian Multipliers
Using [12, Lemma 1], we can justify that the operator [·] + can be omitted from the objective function of P4 without loss of optimality. Hence, P4 can be equivalently expressed as Here, {S K } can be any feasible sets of droneclusters in the n-th TS. Since the CBSs are associated with non-overlapping drone-clusters, P5 is separable per CBS. The optimization problem for the k-th CBS, ∀k ∈ K, is written as (16) is a non-convex optimization problem, and thus solving (16) is non-trivial. To this end, we develop a low-complexity algorithm to near-optimally solve (16) by alternately updating the transmit and jamming power allocations. The detailed solution is provided as follows. 1) Jamming Power Allocation: Let us assume that the transmit power allocation of the legitimate drones is known. We first express the objective function of (16) as a difference-of-concave (DC) functions. To this end, using a change of variable Q In (17), . Therefore, for a given p, we can equivalently express (16) as (18) is a standard DC programming problem. We employ the SCA technique to solve (18). In particular, the first-order Taylor approximation is a global over-estimator of the concave function. Using such a fact, a lower bound to the objective function of (18) is obtained as F is a vector of the given feasible solution to (18) and where In (19), (18) is obtained by iteratively solving the following sequence of convex optimization problems.
where u (t) denotes the optimal solution for the t-th instance of (20), ∀t ≥ 1. The optimal solution to (20) is obtained in the following proposition.
Proposition 1: An optimal solution to (20) is obtained at the limit point of the sequence {u as a given initial point, each point of this sequence is iteratively computed as u . Proof: Due to space limitations, the proof has been moved to the extended version of the paper [36, Appendix A].
The jamming power allocation of the CBSs is optimized using the following iterative procedure. At first, the jamming power of the k-th CBS is initialized to some feasible value Q k , and we set u k is updated according to the limit point of the sequence presented in (21), i.e., u The converged jamming power allocation of the k-th CBS is obtained as Q 2) Transmit Power Allocation: For a known jamming power allocation, using [31, Proposition 2], we can express (16) as max p,a 0 k is a vector of auxiliary variables; W 1 , W 2 , and ψ(p) are defined as and respectively. In (24),γ (22) is decomposed into outer and inner optimization problems as max and max where W 3 is defined in (29), shown at the bottom of this page. In (29), η = [η m ] m∈S (n) k is a vector of auxiliary variables for transforming the fractions into quadratic functions. For a fixed η, (29) is a non-convex function of the transmit power p. The SCA technique is utilized to solve (29) for a fixed η. A concave lower bound of W 3 , with respect to p, is presented in (30), shown at the bottom of the next page. In (30), p is a given feasible transmit power allocation vector. For a fixed η, an approximate solution to (28) is obtained by iteratively solving the following optimization problem, and updating p according to the solution obtained from (31).
It is noted that (26), (28), and (31) are the (unconstrained) convex optimization problems with respect to the variables a, η, and p, respectively. Therefore, the optimal solutions to these optimization problems can be directly obtained at the stationary and respectively. In (34), m,l . For a known jamming power at the CBSs, a converged transmit power allocation is obtained by alternatively updating the variables a, η, and p using (32), (33), and (34), respectively. Such a fact is further confirmed by Proposition 2 of Section V.
3) Algorithm Development: The overall steps to solve P4 are summarized in Algorithm 1. Next, we analyze the computational complexity of Algorithm 1. In particular, Algorithm 1 iteratively executes two inner loops, where inner loop-I and inner loop-II update the jamming and transmit power allocations, respectively. The total number of iterations of both inner loops is T max J max , and the total number of iterations of the outer loop is I max . The computational complexity associated to inner loop-I is O(T max J max K). Meanwhile, the computational complexity associated to inner loop- . Therefore, the required computational complexity for executing a single iteration of the outer loop is O(T max J max (K + M )). The overall computational complexity of Algorithm, 1 is obtained as O(Δ max (K + M )) where Δ max = T max J max I max .

C. Optimal Lagrangian Multipliers
We employ the well-known sub-gradient method to minimize G(λ, ρ, μ) and find the optimal Lagrangian multipliers. The update equations of λ, ρ, and μ are expressed as Initialize u , and iteration index t = 0; (Start of inner loop-I) 6: while t < T max do 7: Update u (t+1) k,j+1 by iteratively executing (21), respectively. Here, ξ (1) , ξ (2) , and ξ (3) are the positive and square-summable step-sizes. It is noted that C1 and C2 are the constraints on the average SR and average transmit power over the TSs, respectively, and C4 is an instantaneous interference threshold constraint. Hence, both λ and ρ are updated only at the end of each TS based on the average SR and average transmit power, respectively. Conversely, μ is iteratively updated along with the instantaneous transmit power of the drones.

IV. DRONE-CLUSTERING SUB-PROBLEM'S SOLUTION
P1 is a combinatorial optimization problem of exponential complexity O(K M ). Thus, finding an optimal solution to P1 is prohibitive for large-scale systems. We apply the mixed-strategy repeated game framework [32] to solve P1, and the motivation is explained as follows. In particular, a prior knowledge of the achievable secrecy-rate (SR) of different drone-to-CBS associations is required. However, the achievable SR with different CBSs is influenced by the drone-to-CBS associations and the resultant inter-drone interference. Essentially, both drone-to-CBS association decisions and the resultant SR from the CBSs depend on each other. To solve this dilemma, it is imperative to better understand the achievable SR of different drone-to-CBS associations by exploring different drone-clusters. A mixedstrategy repeated game is used to fulfill this requirement. In a mixed-strategy repeated game, the players randomize their actions for each round of the game using certain probability mass functions (PMFs) and learn the achievable utility of different actions via exploration. Unlike in a one-shot game, the players in a mixed-strategy repeated game use some of the initial iterations to learn the behavior of other players and utility, and make more judicious decisions in later rounds [33]. Eventually, each player can select the most suitable strategy without having any prior knowledge of the other players' strategies and utility of different actions. By exploiting a mixed-strategy repeated game with drone-clustering as the action, the network controller of our system model can learn the achievable SR of each drone with different CBSs and select the most suitable CBS for each drone. Thus, a mixed-strategy repeated game is well-suited to solve the drone-clustering problem considered in a computationally efficient manner. sys is the sum-SR of the network and it is defined in the objective function of P1. At each game round, based on the joint action of the players, a set of non-overlapping drone-clusters is formed. Thereafter, Algorithm 1 is executed for these drone-clusters, and near-optimal transmit and jamming power allocations are derived. By plugging such power allocations to the objective function of P1, the utility function of the players is determined.

A. Mixed-Strategy Repeated Game Formulation
For each player, there is a PMF over its action space, known as the mixed-strategy profile. Let π  j,k ∈ a, otherwise ς j,k (a) = 0. From (38), the expected utility of a given player simultaneously depends on its own strategy and the strategies of the opponent players. Essentially, each player's optimal strategy is a function of the opponent players' strategies. We consider that the players are rational and choose the best-response strategies. The best-response strategy of the m-th player is defined as BR m (π In the NE, all the players jointly maximize the network sum-SR. Therefore, the NE solution of ψ (n) is sought to solve P1. To identify the NE solution of ψ (n) , we first define the smooth best response (SBR) strategy of the m-th player, ∀m ∈ M, as follows.
Definition 2: For the opponents' fixed joint strategy, π (n) −m , the SBR strategy of the m-th player is defined as β where r the ε-equilibrium can be made sufficiently close to the NE when ε is small. Consequently,π (n) can approach the NE solution of the game ψ (n) .

B. Multi-Agent RL Framework to Learn the NE Solution
To compute the NE solution of the game ψ (n) using (40), the network controller requires prior information about the utility and strategy models. In this case, the utility model refers to the expected utility from different joint actions of the players, and the strategy model refers to the distribution of the joint actions of the players. Since prior knowledge about both the utility and strategy models is unavailable, it is non-trivial to compute the NE solution of the game. To circumvent this issue, we exploit a multi-agent RL framework that can learn the strategy and utility models from the agents' interactions with the environment. Essentially, the multi-agent RL framework is adopted to learn the NE solution of ψ (n) without any prior statistical information. ψ (n) is modeled as a stateless multi-agent RL problem consisting of M virtual agents, each of which represents a drone. The action profile of the virtual agents is given by A = {A m , ∀m ∈ M, are defined at the beginning of Section IV. A. The RL framework's environment is modeled by the proposed cellular-connected IoD system. In each iterative game round (i.e., each episode of the RL problem), the virtual agents receive rewards for interacting with the environment and select new actions based on the rewards they receive. An interaction between the virtual agents and the environment is illustrated in Fig. 2. To facilitate the virtual agents' interaction with the environment, we introduce the actor and critic RL processes as follows.
Critic RL process: Let V , (41) where ν (1) [l] is the critic's learning step-size at the l-th round; π where ν (2) [l] is the actor's learning step-size at the l-th round.
In (41) and (42) For the convergence of (41) and (42), the learning step-sizes should satisfy the following conditions: ν (2) [l] = 0 [35]. The proposed multi-agent RL framework consists of the following three steps. The action selection step, in which the m-th virtual agent, ∀m ∈ M, samples an action from the A (n) m set using current mixed-strategy profiles; the reward calculation step, in which the drone-clusters are determined based on the actions selected by the virtual agents, Algorithm 1 is executed for these drone-clusters, and the virtual agents' rewards are calculated using the derived power allocations; the actor-critic update step, in which the virtual agents' value functions and mixed-strategy profiles are updated by plugging the calculated rewards into (41) and (42). The network controller learns the NE solution of ψ (n) from the aforementioned three steps being iteratively repeated at each round of the game. −m ). Thus, the best action for each agent is selected in a deterministic manner when {κ m } is asymptotically large. To learn the NE mixed-strategy profiles, it is imperative to make the agents explore available actions during the initial search, and exploit the best possible actions during the later search [35]. In this context, the value of κ m , ∀m ∈ M, is gradually increased at each round of the game.

A. Overview of the Proposed SIREN Algorithm
In Algorithm 2, the overall steps of SIREN are provided to solve the joint optimization problem P0. SIREN is implemented in a centralized manner. In each TS, SIREN provides a set of non-overlapping drone-clusters, the transmit power allocation of the drones, and the jamming power allocation of the CBSs using the following procedure. Each TS is divided into three intervals, i.e., namely, the transition interval, the scheduling interval, and the data transmission interval. In the transition interval, the drones move from their current stop-over point to the next stop-over point in their respective sensing zones and collect data. In the scheduling interval, Steps 4-12 of SIREN are Update the transmit and jamming power allocation by plugging the updated drone-clusters to Algorithm 1.

8:
Determine the utility function (i.e., network sum-rate) for the updated drone-clusters and power allocations. 9: Update the value functions and mixed-strategy profiles using (41) and (42), respectively. 10: Update the parameter κ m = a l − 1, where a > 1 11: Update the Lagrangian multiplier μ using (37); l = l + 1; 12: until l > L max 13: Obtain the converged drone-clusters by associating each drone with the CBS having maximum value function. Obtain the final transmit and jamming power allocation. 14: Update the Lagrangian multipliers λ and ρ using (35) and (36), respectively. 15: end for 16: Output: Drone-clusters, transmit power allocation of the drones, and jamming power allocation of the CBSs for n = 1, 2, . . . , N TSs.
iteratively executed in the centralized network controller until the maximum number of iterations is reached. Following this, the network controller determines a set of non-overlapping droneclusters at Step 13 of SIREN, and broadcasts the drone-to-CBS association matrix to all the CBSs in the network. Subsequently, each CBS determines the jamming power and the associated drones' transmit power by executing Algorithm 1. Finally, the CBSs inform the associated drones about their transmit power allocation over reliable control channels. The network controller also updates the Lagrangian multipliers, λ and ρ, as per Step 14 of SIREN. In the transmission interval, the drones transmit their collected data to the associated CBSs using the scheduled transmit power, and the CBSs also simultaneously transmit AN signals using the scheduled jamming power. The aforementioned resource allocation procedure is repeated in each TS. Remark 3: From an implementation point of view, the computational delay required to execute SIREN needs to be orders of magnitude smaller than the duration of each TS. Note that SIREN requires a finite number of iterations for convergence as confirmed from our simulation results (Fig. 8(a) and (b). Furthermore, the computational intensive part of SIREN, i.e., Steps 4-12, are implemented in a centralized network controller, which is usually hosted on an edge cloud platform and hence has powerful hardware for fast computation. For instance, SIREN's execution can be accelerated by implementing Steps 4-12 using multiple parallel and commercial off-the-shelf (COTS) GPUs running in parallel. SIREN can therefore be rapidly executed by leveraging the centralized network controller's fast computation capability.
B. Properties of the Proposed SIREN Algorithm 1) Convergence and Optimality: The following proposition capitalizes on the development of Algorithm 1, and confirms SIREN's effectiveness at solving the transmit and jamming power allocation sub-problem.
Proposition 2: For a given set of drone-clusters, SIREN converges to a near-optimal solution to P2.
Proof: Due to space limitations, the proof has been moved to the extended version of the paper [36,Appendix B].
The following proposition capitalizes on the actor-critic multi-agent RL framework developed in Section IV.B., and confirms SIREN's effectiveness at solving the drone-clustering sub-problem.
Proof: Due to space limitations, the proof has been moved to the extended version of this paper [36, Appendix C].
We leverage these two propositions to confirm the local optimality of SIREN as follows.
Proposition 4: For a sufficiently large number of iterations, i.e., L max → ∞, SIREN converges to a local optimal solution of the joint optimization problem P0.
Proof: Due to space limitations, the proof has been moved to the extended version of the paper [36, Appendix D].
2) Computational Complexity: We first determine the required computational complexity for SIREN in each TS. The computational complexity of SIREN in each TS is dominated by Steps 7,9,10,11,13 and 14. In particular, the required computational complexity for executing Step 7 is O(Δ max (K + M )). Meanwhile, for each drone, a total of (K + 1) computations are required to update the value functions and mixed-strategy profiles. Therefore, the required computational complexity of Step 9 is O(MK). Finally, Steps 10 and 11 of SIREN require O(M ) and O(K) computations, respectively. Recall that the aforementioned steps are repeated for a total of L max number of times in each TS, where L max 1. Conversely, Steps 13 and 14, whose required computational complexity is O(Δ max (K + M )) and O(M ), respectively, are executed only once in each TS. Therefore, the required computational complexity for SIREN in each TS is approximated as O(L max (MK + Δ max (K + M ))). As a result, the overall computational complexity of SIREN is obtained as O(NL max (MK + Δ max (K + M ))). Evidently, SIREN requires a polynomial computational complexity to obtain a converged solution to P0.
3) Signaling Overhead: In each TS, SIREN requires the following information exchanges. First, at the end of scheduling interval of a TS, the network controller broadcasts the drone-to-CBS association matrix to all the CBSs in the network. This requires a total of MK information exchanges between the network controller and CBSs. Second, at the end of a TS, each CBS forwards the transmit power allocation information to its associated drones over reliable control channels. This requires a total of M information exchanges between the CBSs and drones. Consequently, a total of NM(K + 1) information exchanges are required by the SIREN algorithm.

A. Simulation Setting
We consider a 1500 m × 1500 m geographical area, where six CBSs are placed at the vertices of a 600 m × 600 hexagonal region. We simulate five different system configurations consisting of M = 5, 10, 15, 20 and 25 data transmitting drones. Leveraging the circle packing tool [37], the entire geographical area of each system configuration is divided into M non-overlapping sensing zones. Each sensing zone consists of N = 30 uniformly distributed stop-over points. For each system configuration, we repeat 25 different simulation trials with independently generated stop-over points at the sensing zones, and present the average WC-ASR over all the independent simulation trials. A circular flight trajectory is considered for the eavesdropping drone. The altitudes of the legitimate and eavesdropping drones are H d = 80 m and H e = 50 m, respectively. The height of each CBS is H b = 10 m. The estimation error of the eavesdropping drone's 2D positions is χ = 10 m, and the A2G/G2A and A2A channels' gains at the reference distance are β 1 = 10 −5 and β 2 = 10 −4 , respectively [17]. The SI channel gain, f d , follows Rayleigh distribution. Each CBS employs an orthogonal RRB of bandwidth W = 10 MHz for aerial communications. Each CBS also shares its aerial RRB with all the other CBSs in the network for uplink cellular communications. The remaining simulation parameters, summarized in Table I, are selected based on [6], [7], [17]. Finally, to execute the proposed Algorithms, we set the parameters ν (1) (l) = 1 (l+1) 0.7 , ν (2) (l) = 1 (l+1) 0.6 , a = 1.1, J max = 15, T max = 100, I max = 30, and L max = 50. Note that to avoid an infeasible solution to P0 for all the simulation settings considered, relatively large average and peak transmit power limits are considered for the legitimate drones. This is consistent with the existing literature [5], [6]. Nevertheless, by appropriately adjusting the Lagrangian multiplier, {μ n,k }, our proposed optimization framework confines the drones' converged transmit power to a certain value that satisfies interference constraint C4 and prevents the drones from creating harmful interference at the terrestrial CBSs.

B. Performance Comparison With the Interference-Aware Benchmark Power Allocation Schemes
In this sub-section, we demonstrate the efficacy of SIREN in maximizing the achievable WC-ASR of the network compared to the following three interference-aware power allocation schemes 1) Iterative Function Evaluation (IFE) Based Power Control: In the IFE scheme, the jamming and transmit power allocations are alternately updated, where the CBSs' jamming power allocation is determined using Steps 4-9 of  Fig. 3, we compare the achievable WC-ASR of SIREN and the aforementioned three interference-aware power allocation schemes by varying the numbers of drones in the network. To make a fair comparison, the entire framework of SIREN is used for each benchmark power allocation scheme, while modifying only Step 7 of Algorithm 2. Fig. 3 shows that the IFE scheme provides a higher WC-ASR for small numbers of drones. Conversely, SIREN outperforms the IFE scheme for large numbers of drones. In particular, the inter-drone interference at the drone-clusters becomes substantial as the number of drones increases. Meanwhile, SIREN optimizes the drones' transmit power by leveraging the quadratic transformation of fractional programming problems, which is highly effective for alleviating interference in the network [31]. Thanks to the efficient management of inter-drone interference, SIREN achieves an improved WC-ASR for large numbers of drones. It is observed from Fig. 3 that SIREN achieves 42.98% higher WC-ASR than the IFE scheme for a network of 25 drones. Fig. 3 shows that SIREN also outperforms both GPBC and FPC-FJ schemes for both small and large numbers of drones. In particular, since GBPC switches between the given maximum and minimum powers, the resultant jamming and transmit powers can be far from the optimal values. Meanwhile, FPC-FJ only takes the interference links into account, and ignores both eavesdropping and jamming links. Intuitively, both GBPC and FPC-FJ are sub-optimal, and thereby SIREN achieves substantial performance gains over both GBPC and FPC-FJ, especially for large numbers of drones. Fig. 3(b) depicts that for a network of 25 drones, SIREN achieves 1.99 times and 3.96 times more WC-ASR than the GBPC and FPC-FJ methods, respectively. We conclude that SIREN is interference resilient and particularly advantageous for dense IoD networks.

C. Performance Comparison With the Quasi-Exhaustive Search and Matching Empowered Drone-Clustering Methods
In Fig. 4, we compare the achievable WC-ASR of SIREN and a quasi-exhaustive search empowered drone-clustering method for various numbers of drones. In the quasi-exhaustive search method, the network controller determines (almost) all the possible combinations of drone-clusters, calculates the transmit and jamming power allocations for each combination of drone-clusters, and selects the drone-cluster combination that provides the largest WC-ASR for the network. By repeating these three steps Θ times at each TS, where Θ > 1 is a predefined number, the network controller obtains near-optimal droneclusters and the corresponding power allocation. Intuitively, the quasi-exhaustive search method can obtain the optimal WC-ASR, and thus outperforms SIREN. The overall computational complexity of the quasi-exhaustive search method is given by O(N ΘΔ max (M 2 K + K 2 M )). It is clearly much greater when there is a large number of drones in the network. Conversely, SIREN's required computational complexity increases linearly as the number of drones increases, and consequently, SIREN is scalable. Despite a significant reduction in the computational complexity, SIREN experiences only a small performance loss compared to the quasi-exhaustive search method. Fig. 4(a) shows that SIREN exhibits 6.9% smaller WC-ASR than the quasi-exhaustive search method when there are 25 drones in the network. Evidently, SIREN strikes a suitable balance between optimal performance and required computational complexity.
In Fig. 4, we also plot the achievable WC-ASR of a greedy matching empowered drone-clustering method for various numbers of drones. Here, we consider drone-clustering as a two-sided matching problem where each drone aims to be associated with the nearest CBS, and each CBS can be associated with a maximum of N d nearest drones. In particular, we set N d = 5. This matching problem is near-optimally solved using a greedy algorithm. Note that the greedy matching method avoids iterating between the drone-clustering and power allocation phases, and thus requires a low computational complexity. However, due to the distance based drone-to-CBS associations, the drone-clusters generated by the greedy matching method can exhibit large inter-drone interference. As a result, the achievable WC-ASR of the greedy matching method is significantly reduced as the number of drones increases. Fig. 4 depicts that SIREN achieves 9.18 times more WC-ASR than the greedy matching empowered drone-clustering method when there are 25 drones in the network. Accordingly, SIREN is significantly more efficient than the greedy matching empowered drone-clustering method.

D. Performance Comparison With Benchmark RL Empowered Drone-Clustering Schemes
In this sub-section, the superiority of SIREN over two different RL empowered drone-clustering schemes is revealed. In the first benchmark scheme, the drone-clustering sub-problem P1 is solved by leveraging the stochastic learning automata (SLA) framework [40]. In the SLA empowered scheme, the network controller determines the resource allocation for each TS by iteratively carrying out the following three steps: (i) Update the mixed-strategy profiles of the drones by applying [40, eqs. (4), (5)]; (ii) determine a set of drone-clusters using the updated mixed-strategy profiles; (iii) calculate the transmit and jamming power allocations by executing Algorithm 1 for the updated drone-clusters. Fig. 5(a) compares the achievable WC-ASR of SIREN and SLA empowered scheme for various numbers of drones and six CBSs in the network. In the SLA empowered scheme, at each TS, the mixed-strategy profiles of the drones converge to stable PMFs. However, unlike SIREN, SLA does not provide any guarantee on the convergence of the mixed-strategy profiles to the SBR strategies. We emphasize that the SBR strategies obtain a pure-strategy NE solution, and thus near-optimally solve the sub-problem P1. Intuitively, compared to SIREN, the drone-clusters obtained by the SLA empowered method are sub-optimal. Accordingly, SIREN offers notable performance improvements over the SLA empowered method for both small and large numbers of drones. For instance, Fig. 5(a) illustrates that for a network of 25 drones, SIREN obtains 20% higher WC-ASR than the SLA empowered method.
In the second benchmark scheme, the drone-clustering subproblem is solved by applying the multi-armed bandit (MAB) learning with upper confidence bound (UCB) policy [29]. Particularly, the MAB learning problem has a set of bandits, where each bandit has a set of arms to play at each game round. Here, the goal is to learn suitable strategies for the bandits to play the arms so that the expected reward is maximized. To address the exploration-exploitation dilemma of the bandits, the UCB policy is utilized to select the arms. Such MAB-UCB method can be utilized to solve the sub-problem P1 by considering the drones as bandits and the CBSs as arms. More specifically, to obtain resource allocations using the MAB-UCB method, the network controller iteratively repeats the following two steps in each TS: (i) Select the CBSs for the drones by applying the UCB method [29, eq. (24)] and update the drone-clusters, and (ii) calculate the transmit and jamming power allocations by executing Algorithm 1 for the updated drone-clusters. In Fig. 5(a), we compare the achievable WC-ASR of SIREN and the MAB-UCB method for different numbers of drones. We can readily demonstrate that the MAB-UCB method generates stable drone-clusters. However, such drone-clusters are not necessarily the NE solution of sub-problem P1. Accordingly, the MAB-UCB method exhibits noticeable performance loss compared to the proposed SIREN algorithm. For instance, Fig. 5(a) illustrates that for a network of 25 drones, SIREN obtains 34.71% higher WC-ASR than the MAB-UCB method. Fig. 5(b) compares the achievable WC-ASR of SIREN, SLA, and MAB-UCB methods for three CBSs and different numbers of drones in the network. Due to the lower diversity offered by A2G links, the achievable WC-ASR of all three schemes in Fig. 5(b) is reduced compared to Fig. 5(a). Nevertheless, SIREN outperforms both SLA and MAB-UCB methods for a small number of CBSs as well. For instance, Fig. 5(b) shows that SIREN obtains 8.25% and 38.09% higher WC-ASR than the SLA and MAB-UCB methods, respectively, when there are 20 drones in the network. Overall, SIREN is superior than both SLA and MAB-UCB empowered drone-clustering schemes. Fig. 6 presents the achievable WC-ASR of SIREN for different values of the SI cancellation coefficient, , in a network of 20 drones and six CBSs. When the values of increase, the residual SI at the CBSs caused by the cooperative jamming increases and the channel capacity of the legitimate links strictly decreases. Hence, as observed from Fig. 6, the achievable WC-ASR of SIREN also decreases with the increase in . Recall that SIREN adapts the jamming power of the CBSs while taking the residual SI into account. To show the efficacy of SIREN over the fixed jamming power allocation strategy, the achievable WC-ASR of two SI agnostic schemes is plotted in Fig. 6. In the first SI agnostic scheme, denoted as "Max-JAM," the CBSs always transmit the AN signals using the maximum jamming power. In the second SI agnostic scheme, denoted as "Low-JAM," the CBSs always transmit the AN signals using a small jamming power. The fixed jamming power of the CBSs in the Low-JAM scheme is set to 0.1 W. When the values of decrease, the residual SI at the CBSs is reduced and SIREN enables the CBSs to transmit the maximum jamming power. Fig. 6 thus shows that for sufficiently small values of , both SIREN and the Max-JAM scheme achieve almost the same WC-ASR. However, due to the severe residual SI at the CBSs, the achievable WC-ASR of the Max-JAM scheme is significantly reduced in the large regime. Fig. 6 depicts that SIREN achieves 94.92 times higher WC-ASR than the Max-JAM scheme when = −50 dBm. Meanwhile, thanks to the small jamming power, the Low-JAM scheme exhibits small residual SI at the CBSs. In consequence, the Low-JAM scheme obtains higher WC-ASR than the Max-Jam scheme when is large. Conversely, the Max-JAM scheme obtains higher WC-ASR than the Low-JAM scheme in the small regime. The proposed SIREN, however, outperforms the Low-JAM scheme for both small and large values of . For instance, Fig. 6 shows that SIREN achieves 4.48 times higher WC-ASR than the Low-JAM scheme when = −50 dBm. In essence, thanks to the SI awareness, SIREN is more efficient than the fixed jamming power allocation schemes. Fig. 7 illustrates SIREN's achievable WC-ASR by varying the number of drones and the SIC error coefficients in a network of six CBSs. As expected, for a given number of drones, SIREN achieves the largest WC-ASR for ideal SIC at the CBSs, whereas an increase in SIC error coefficients reduces the achievable WC-ASR. Fig. 7 also shows that the multi-user diversity gain is reduced for large SIC error coefficients. This observation is intuitive since the resultant co-channel inter-drone interference at the CBS is considerably greater for large drone-clusters and large SIC error coefficients. As a result, increasing numbers of drones provides a small increment in achievable WC-ASR. Fig. 7 shows that as the number of drones increases from 5 to 25, the achievable multi-user diversity gain of practical SIC is reduced by 20.27%, 64.81%, and 74.06% for c = 10 −5 , c = 10 −3 , and c = 10 −2.5 , respectively, with respect to ideal SIC. We conclude that the capability of performing accurate SIC at the CBSs notably impacts the proposed algorithm's PLS performance.   Fig. 8(a), at the beginning, the CBS selection probability of the drone in question is { 1 6 , 1 6 , 1 6 , 1 6 , 1 6 , 1 6 }, i.e., the probability of associating this drone with a certain CBS is the same for all the CBSs in the network. As L max increases, the network controller's knowledge of the drone's achievable SR with different CBSs improves, and it is then able to select the most suitable CBS for the drone. Consequently, the PMF used to select the CBS for the drone becomes deterministic. Fig. 8(a) shows that the PMF used to select a CBS for the drone in question almost converges to {0, 1, 0, 0, 0, 0} for large values of L max , i.e., the drone is associated with the second CBS in the network. Likewise, we can show that the CBS selection PMFs used for other drones also become deterministic. This implies that drone-clusters become stable for large values of L max . Moreover, when the drone-clusters become stable, consistent from Proposition 2, the transmit and jamming power allocations also become invariant. This fact leads to the convergence of the overall resource allocation. To confirm this, we plot the network's achievable WC-ASR in Fig. 8(b) for various iterations of SIREN. We observe that WC-ASR increases with L max . This is due to the fact, that as L max increases, the network controller can make more informed decisions about achieving a high network WC-ASR. Fig. 8(b) shows that the WC-ASR becomes almost stable after 35 iterations of the SIREN algorithm for both small and large numbers of drones. We therefore conclude that the convergence of SIREN to a stable resource allocation is guaranteed.

VII. CONCLUSION
We investigated a resource optimization scheme to manage interference and enhance the PLS of multi-drone cellular-connected IoD networks. A joint optimization problem was formulated to maximize the network's WC-ASR while considering clustering of the legitimate drones, NOMA-enabled data transmission, and cooperative jamming from the FD CBSs. A two-level iterative optimization approach was devised to address the computational intractability of the joint optimization problem. A multi-agent RL framework was employed to optimize the drone-clusters, and the transmit power of the legitimate drones and jamming power of the CBSs were optimized by leveraging fractional programming, SCA, and alternating optimization techniques. A centralized and convergent resource allocation algorithm, entitled SIREN, was proposed to solve the joint optimization problem. Simulation results revealed that the proposed SIREN algorithm attains a small performance loss compared to the quasi-exhaustive search empowered method with a significantly reduced computational complexity. Simulation results also confirmed that in dense IoD networks, the proposed SIREN algorithm manages inter-drone interference more efficiently than the existing interference-aware power allocation schemes, and achieves a higher WC-ASR than the benchmark RL empowered drone-clustering schemes.