Reconfigurable Intelligent Surface-Enabled Federated Learning for Power-Constrained Devices

Federated learning (FL) has recently emerged as a novel technique for training shared machine learning models in a distributed fashion while preserving data privacy. However, the application of FL in wireless networks poses a unique challenge on the mobile users (MUs)’ battery lifetime. In this letter, we aim to apply reconfigurable intelligent surface (RIS)-aided wireless power transfer to facilitate sustainable FL-based wireless networks. Our objective is to minimize the total transmit power of participating MUs by jointly optimizing the transmission time, power control, and the RIS’s phase shifts. Numerical results demonstrate that the total transmit power is minimized while satisfying the requirements of both minimum harvested energy and transmission data rate.

handling complicated tasks pertaining to channel estimation, spectrum sensing, resource allocation, etc. Nevertheless, given that the underlying principle of ML is the collection of raw data generated at end-devices and stored at a centralized server for model training purposes, a number of concerning issues have been flagged. First, in cloud-based ML algorithms, users privacy is compromised due to the exchange of their local datasets, exposing participating mobile users (MUs) to potential security attacks. Second, the long propagation delay in centralized ML algorithms limits their applications in realtime scenarios. Finally, the centralized ML paradigm suffers from the increased network overhead, rendering it unsuitable for power-constrained MUs [2].
Recently, federated learning (FL) has been identified as an efficient decentralized learning mechanism, in which improved data privacy, reduced latency, and network overhead can be achieved [3], [4]. FL is a collaborative learning mechanism, in which the on-board computing and storage capabilities, in addition to the local datasets of participating MUs (clients), are leveraged to perform local model training. The locally trained model updates are then shared with a cloud-based server for aggregation and global model evaluation. These steps are repeated until a desired level of accuracy is attained. Given that less information is required to be shared with the central server, compared to classical cloud-centric ML algorithms, FL features enhanced user privacy and offers a better utilization for the network resources. Inspired by its promising potentials, extensive research attempts have been initiated to investigate the advantages of the interplay of FL and other enabling technologies, including reconfigurable intelligent surface (RIS), blockchain, aerial/satellite communications, and mobile edge computing [3], [4], [5], [6], [7], [8].
Despite the several advantages of FL, only few attempts have explored the problem of executing energy-consuming computing and communication tasks at MUs with limited energy budget. In [9], the problem of joint delay and energy minimization in an IoT network was investigated by using a three-tier offloading scheme. The authors in [10] minimized the overall energy consumption at participating MUs using the non-orthogonal multiple access (NOMA) scheme at the uplink transmission. This was accomplished by optimizing the number of iterations a client is requested to update its local model, under a particular global model accuracy threshold.
On the other hand, the work in [11], [12], [13], and [14] considered the adoption of the wireless power transfer (WPT) concept as a mean to supply power-limited MUs and enable them to participate in the training process. In particular, in [11], a hybrid radio frequency (RF)/visible light communication (VLC) scenario was considered, in which the VLC link over the downlink is leveraged for energy harvesting (EH) purposes, while the RF is used for the local model updates transmission. From a similar perspective, the authors in [12] applied the time-switching paradigm of WPT to an FL system, in which they studied the trade-off between learning and WPT, and further optimized the MU clock frequency for improved utilization of harvested energy for model evaluation. From a similar point of view, the authors in [14] minimized the mean squared error (MSE) by optimizing the aggregation beamforming and consumed energy in a vehicular energylimited network. The authors in [13] leveraged RIS for WPT and developed an improved local model updates transmission. In particular, they formulated an optimization problem to obtain the RIS phase shift (PS) vector that minimizes the MSE.
Unlike the works in [10], [11], [12], and [14], which considered the application of WPT in FL, motivated by the intertwined benefits of RIS and FL paradigms, our work considers leveraging RIS for WPT purposes, to enable powerconstrained MUs meet the computing and communication requirements imposed by the FL process. Instead of focusing on minimizing the MSE, as in [13], our proposed framework aims to minimize the total transmit power of the participating MUs, while satisfying particular computing and communication requirements. This is achieved by jointly optimizing the RIS and MUs operational parameters.
Notation: X T denotes the transpose of a matrix X, E[·] represents the expectation operation, |·| represents the absolute value of a complex scalar, and {·} returns the real part of an argument.

II. SYSTEM MODEL
We consider a FL model, in which the downlink transmission is utilized to recharge K EH-enabled MUs, while the uplink is dedicated for communicating the FL local model parameters to a base station (BS), as depicted in Fig. 1. During the downlink transmission, a BS 1 communicates with the K MUs through the assistance of an RIS, comprising N reflecting elements (REs). This is motivated by the assumption that the direct links between BS 1 and the K devices are unavailable. Therefore, the RIS is exploited to extend the signal coverage and to enhance the received signal strength, and thus, to enhance the harvested energy [15]. Without loss of generality, we assume that each node in the network is equipped with a single antenna.

A. Wireless Power Transfer (WPT) Model
In the underlying system model, BS 1 sends the RF signal, The average transmit power at BS 1 is represented by P BS 1 . The baseband signal received at the k-th MU can be written as where g ∈ C N ×1 and h H k ∈ C 1×N denote the channels from BS 1 to RIS, and from RIS to the k-th MU, respectively. Φ ∈ C N ×N represents the PS matrix of RIS, which can be written as: Φ diag(e jθ1 , e jθ2 , . . . , e jθN ), where θ n ∈ [0, 2π) denotes the PS of the n-th RE on the RIS. Further, Φ can be rewritten as represents the additive white gaussian noise (AWGN) over the downlink transmission.
Assuming all K MUs are equipped with EH devices, the harvested power at the k-th MU can be written as where η accounts for the energy conversion efficiency. In this work, we consider that the harvested power is the only source of power. In particular, the harvested power is divided into two parts, namely μP EH and (1−μ)P EH , which are dedicated for local model transmission and computation, respectively, with μ denoting the power splitting factor.

B. Distributed FL Model
Consider that all K MUs are selected by BS 2 to perform a particular on-device distributed FL task, which aims at optimizing the model parameter z that minimizes the local loss function f k (z): where S k denotes the size of the local data set D k of device k and f i (z) represents the loss function associated with the data pair i. Without loss of generality, we assume that all local data sets have uniform size, i.e., S k = S. To maintain higher spectral efficiency and reduce the total number of communication rounds between BS 2 and devices, a model averaging scheme [16] is utilized. In particular, BS 2 broadcasts the global model update z to the selected K MUs. At each MU, a local update algorithm is executed to generate the local updated modelz k , relying on the local data set and the received global model. Then, the selected MUs send K weighted local models, which are aggregated at BS 2 to compute the updated global model, asẑ = ξ k∈K φ k (z k ) , where ξ denotes the post-processing function at BS 2 , while φ k represents the pre-processing function at the k-th MU. Since the harvested energy from the RF signal constitutes the only source of energy at the K MUs, we assume that all local model parameters at each MU are transmitted over a single transmission period. Therefore, by setting x k = φ k (z k ) to represent the transmitted signal at the k-th MU, the target function to be estimated at BS 2 can be written as χ = k∈K x k . Note that the central processing unit (CPU) energy consumed to process all data at the k-th MU over a single local iteration can be expressed as [17] where ν k denotes the effective capacitance coefficient of the computing chipset at the k-th MU, and c k represents the number of CPU cycles required to process one sample data at the k-th MU. Also, ω k accounts for the k-th MU's CPU cycle frequency. The computation time for one local iteration is T cp,k = c k S/ω k . It is worth highlighting that the energy required for model transmission at the k-th MU can be evaluated as [18] where P T,k and T cm,k denote the transmission power and the transmission time at the k-th MU, respectively. Considering that the uplink channel between the k-th MU and the second BS is Υ k , which follows the Rayleigh distribution, and based on the time division multiple access (TDMA), the received signal at BS 2 for the k-th MU is given by where n BS 2 ∼ CN(0, σ 2 ) represents the AWGN over the uplink transmission. At BS 2 , the target function is estimated by utilizing aggregation beamforming, υ, and thus, the achievable rate of the k-th MU at BS 2 can be evaluated as III. PROBLEM FORMULATION Let us define P T P T,1 , P T,2 , . . . , P T,K T , T cm T cm,1 , T cm,2 , . . . , T cm,K T , and ω ω 1 , ω 2 , . . . , ω K T .
With δ = 1 − μ, the optimization problem is formulated as min PT ,ω,δ,ψ, Tcm,μ,υ k∈K In the above problem, (8b) and (8c) imply that the energy consumed by local model computation and communication at the k-th MU should be less than its corresponding harvested energy. Eq. (8d) ensures that the transmission rate of the k-th MU is greater than or equal to the required data size Θ k . Constraint (8e) ensures that each global round should be finished within the time frame τ , where M k denotes the required number of local iterations at the k-th MU. The CPU frequency of the k-th MU is specified in (8f), where ω min and ω max denote the minimum and maximum CPU frequency of the k-th MU, respectively. Finally, (8h) and (8i) imply that the aggregation beamforming has unit power and unit-modulus constraint at the RIS, respectively. It is worth noting that problem (8) is non-convex and challenging to solve, due to the non-convexity nature of constraints (8c)-(8e), (8h), (8i) and the coupling of multiple optimization variables.

V. NUMERICAL RESULTS
We consider a 2-D scenario, in which two BSs and one RIS are located at (0, 20 m), (50 m, 20 m), and (20 m, 0), respectively. The total number of MUs K is 8, which are randomly distributed within a circular disc with the center (30 m, 10 m) and a radius of 1 m. Without loss of generality, we assume that the system bandwidth B is 20 MHz, the transmit power of BS 1 is 10 W, the energy harvesting efficiency η is 0.9, and the number of RIS REs N is 64. For local computation, the maximum and minimum CPU frequencies ω max and ω min are 1.5 GHz and 0.3 GHz, respectively, for all users [11]. The required data size Θ k is set to 30 kbits, the coefficient of computing chip ν k is 2 × 10 −28 , and the local data S and the number of CPU cycles c k are assumed to be 1000 bits and 10 4 , respectively. The maximum number of local iterations at the k-th MU M k is 4, and the time frame τ is set to 8 s. All simulation results are generated by averaging over 1000 different random channel realizations. Fig. 2(a) shows the convergence behavior of Algorithm 1. It is clear that Algorithm 1 needs around 5 iterations to reach the optimal value of the total transmit power of participating MUs. Besides, increasing Θ k requires more total transmit power of participating MUs.
In Fig. 2(b), this power is illustrated versus the number of MUs, for different phase selection schemes, i.e., optimized, discrete, and random PS. Not surprisingly, the total transmit power of the participating MUs increases with the number of MUs. Additionally, optimized continuous PS achieves a  better performance compared to the discrete and random PS cases.
Figs. 3(a) and 3(b) plot the total transmit power versus the number of RIS elements. As expected, increasing N results in lower total transmit power of all devices. This is because more power can be harvested by all devices, which increases the harvested power used for model computation and transmission. The increase in the harvested power leads to the decrease in the time required for model computation at all devices, which allows the devices have more time for model transmission.
Further, increased total transmit power of participating MUs is needed for higher Θ k .
VI. CONCLUSION In this article, we have considered the total transmit power minimization problem of FL-based wireless networks with the assistance of the RIS. It involves a joint optimization of the transmission time, power control, and the RIS's phase shifts, and is formulated as a non-convex problem. To solve this problem, we have developed an alternating descent algorithm based on the IA framework, which converges at least to a locally optimal solution. Numerical results have verified the quick convergence of the proposed algorithm and the benefit of using RIS.