Differentially Private Federated Multi-Task Learning Framework for Enhancing Human-to-Virtual Connectivity in Human Digital Twin

Ensuring reliable update and evolution of a virtual twin in human digital twin (HDT) systems depends on any connectivity scheme implemented between such a virtual twin and its physical counterpart. The adopted connectivity scheme must consider HDT-specific requirements including privacy, security, accuracy and the overall connectivity cost. This paper presents a new, secure, privacy-preserving and efficient human-to-virtual twin connectivity scheme for HDT by integrating three key techniques: differential privacy, federated multi-task learning and blockchain. Specifically, we adopt federated multi-task learning, a personalized learning method capable of providing higher accuracy, to capture the impact of heterogeneous environments. Next, we propose a new validation process based on the quality of trained models during the federated multi-task learning process to guarantee accurate and authorized model evolution in the virtual environment. The proposed framework accelerates the learning process without sacrificing accuracy, privacy and communication costs which, we believe, are non-negotiable requirements of HDT networks. Finally, we compare the proposed connectivity scheme with related solutions and show that the proposed scheme can enhance security, privacy and accuracy while reducing the overall connectivity cost.


I. INTRODUCTION
D IGITAL twin (DT) continues to attract wide atten- tion, especially in communications networks [1], healthcare [2], [3], and manufacturing [4], [5] because of its ability to improve the current systems by leveraging newly emerged algorithms including machine learning, optimization and artificial intelligence as well as communication technologies, such as edge intelligence, security and privacy-preservation [6], [7], [8].When adopted, DT is capable of allowing a digital representation of real-world equipment, process, objects or environment by creating corresponding virtual twins (VTs) in the virtual space [9].Specifically, in human-centric systems such as medical cyber-physical systems, the human digital twin (HDT) facilitates the co-evolution of both humans and VTs [2], and thus can transform the current healthcare systems, environmental monitoring systems, and other applications by integrating human behaviour and activities.
Generally, an HDT encompasses any human being, otherwise called a physical twin (PT), located in the physical environment, its digital replica, i.e., its corresponding VT located in the virtual environment, and a mapping between these two environments through reliable data links [2].This mapping is expected to ensure continuous and reliable interactions between human-virtual twin pairs considering the dynamic nature of the physical environment.While HDT can significantly improve the quality of services and experiences in the physical environment, the diverse requirements [1], [2], [3], [9] in terms of latency, privacy, security, reliability, data rate, and other user-defined performance metrics make it very complicated and challenging to achieve a reliable mapping or connectivity between physical and virtual environments [10].Furthermore, there are currently insufficient possibilities for the physical-virtual environment synchronizations to establish closed loops, a lack of high-fidelity and quantification models as well as difficulties in obtaining accurate predictions of complex physical systems [4].All these make HDT suffer in many aspects in terms of accuracy, security, privacy, synchronization and connectivity.
To address security and privacy issues, blockchain and federated learning (FL) techniques have been widely adopted in recent DT solutions [3], [11], [12] owing to their ability to support the training of machine learning models in a decentralized manner.However, the adoption of blockchain often relies on high latency and energy-intensive consensus algorithms [13], [14], which cannot meet the specific requirements of HDT.In addition, although FL continues to receive wide considerations when providing solutions to address privacy concerns, privacy leakage has been reported to remain a potential issue.For instance, when clients synchronize their learned model parameters with the global server, an attacker 0733-8716 © 2023 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
may infer some data properties or recover the raw data based on this shared information [15], [16].Moreover, conventional FL suffers from many other challenges such as statistical heterogeneity, high computation and communication costs, and limited fault tolerance.Thus, an effective solution for HDT integrating blockchain and FL has to be proposed, which can accelerate the learning process without sacrificing accuracy, privacy and communication costs.

A. Contributions
Although connectivity problems in HDT [3], DTempowered 6G networks [11] and DT edge networks [12] have earlier been studied, they may suffer from high latency that characterizes traditional blockchain-enabled systems, as well as data leakages and statistical heterogeneity issues that characterize the conventional FL schemes.In addition, synchronization accuracy is an important metric in HDT for sufficient performance evaluations.However, such a metric is difficult to obtain and has been neglected in previous works.Moreover, the effects of scheduled offloading rates with queueing constraints for a status update and privacy budget on both the synchronization cost and the overall HDT system performance have never been studied.
To address all these issues, in this paper, we proposed a new HDT framework, which integrates differential privacy, federated multi-task learning (FML) [15] and quality training-based validation process (a newly proposed lightweight blockchain-enabled consensus method) in the presence of heterogeneous environments.The proposed framework finds the balance between synchronization accuracy and connectivity cost without compromising security and privacy which, we believe, are non-negotiable requirements of HDT networks.To the best of our knowledge, such a framework that captures statistical heterogeneity, synchronization accuracy, synchronization cost and other related performance has never been presented.The contributions of this paper are thus summarized as follows: • We propose a secure differentially private federated multi-task learning (DPFML) framework for HDT by integrating the DPFML and a computational-efficient blockchain-enabled validation process to provide a secure, privacy-preserving and more accurate human-tovirtual twin connectivity solution.
• We analytically study the connectivity cost of the proposed DPFML-enabled connectivity scheme to investigate the influence of some important system parameters, including privacy budget, on synchronization cost as well as the long-term average connectivity cost, overall time cost and energy cost.To capture the validation cost inherent in blockchain, we propose a new consensus mechanism based on the quality of trained models during FL, called proof of model quality (PoMQ).
• Following this, we formulate a connectivity problem in the proposed DPFML-based framework as a Markov decision process (MDP) to minimize the connectivity cost.To solve the MDP, we propose a deep reinforcement learning (DRL) algorithm using the deep deterministic-policy gradient (DDPG) approach.
• Finally, we compare the proposed framework with existing frameworks through simulation and demonstrate the ability of the proposed solution to offer synchronization accuracy and reduced connectivity cost without compromising security and privacy.

B. Organization
The remainder of this paper is structured as follows.Section II reviews related works, while Section III introduces the details of the proposed system model.In Section IV, we present the analysis of the connectivity cost -time, privacy and energy.The formulated MDP-based optimization problems and DRL-based solutions are presented in Section V. Section VI discusses the simulation results and the performance of the proposed scheme, while Section VII concludes this paper.

II. RELATED WORK
In this section, we discuss some of the earlier presented related frameworks and solutions.For clarity, we categorized these existing works into three: the human digital twin, FL in DT applications and differentially private solutions for FL.

A. Human Digital Twin
HDT is an emerging technology that is recently attracting more consideration in many domains, including medical, sports and manufacturing [17].It relies on the concept of DT to create a virtual replica of human, body organs or habits in the virtual environment [2].Note that DT provides enhanced system performance by combining both system models and analyses with real-time measurements for any individual system.It facilitates model evolutions over the lifecycle of any physical system while supporting the derivation of solutions with the ability to aid real-time optimizations of such a physical system.
Similar to DT, HDT possesses the potential to revolutionize the practice of human system integration by adopting real-time sensing and feedback to tightly couple measurements of human performance, behaviour, as well as the influence of environment throughout a product's life cycle, on human modelling to improve system design and performance [17].Unlike the VT in the conventional DT, however, human VTs often possess distinct underlying variability among each other as well as dependence between humans and products in the physical environment.Since each human VT (referred to as VT henceforth for simplicity) evolves with data from its counterpart PT, located in the physical environment, its design and implementation are known to be very difficult.
A DT solution was presented in [18] for elderly healthcare services, while [19] discussed a deep neural-based model for capturing bi-directional context relationships when predicting lung cancer.A software-based HDT was similarly presented in [20] for tracking fitness-related measurements describing an athlete's behaviour on consecutive days, while the work in [21] presented a cardio twin architecture for the detection of ischemic heart disease.In [22], a DT ecosystem for health and well-being was presented.It is worth mentioning that, although the majority of [18], [19], [20], [21], and [22] discussed the importance of connectivity in HDT, none of these works delved into investigating and modelling this connectivity scheme.In [3], an edge-assisted connectivity framework for HDT was presented.While the presented framework is interesting, statistical heterogeneity, synchronization accuracy and other related performance issues such as data leakages were not considered.This paper addresses these limitations by proposing a connectivity scheme that considered all important HDT-specific requirements including statistical heterogeneity, synchronization accuracy and data leakages.

B. FL in DT Applications
One of the underlying limitations of DT and HDT applications is privacy.Since every physical object must continuously share its data with its corresponding VT, privacy becomes an important concern.To address this, the work in [3] adopted FL to preserve data privacy in HDT networks.Similarly, FL was adopted in [10] and [11] to learn a behavioural model from user data towards achieving low latency in DT-empowered 6G networks and in [12] to achieve efficient communication in DT edge networks since offloading all running data to the VT can incur a large amount of communication resource, cost, and time while leading to privacy issues.In a similar work, the authors in [23] carried out an optimization of FL using the DRL method to construct the DT-empowered industrial IoT model.The work proposed an asynchronous FL scheme that is capable of addressing the discrete effects caused by heterogeneous industrial IoT devices.A cooperative FL was developed in [24] to facilitate DT construction in resource-limited smart devices.An iterative double auction-based joint cooperative FL with an update verification scheme was designed.
In [25], an FL-based anomaly detection model was proposed using DT by utilizing edge cloudlets when running anomaly detection models locally, while the work in [26] presented a blockchain-enabled adaptive asynchronous FL paradigm for privacy-preserving and decentralized DT networks.However, these works do not consider possible data leakages in conventional FL algorithms which is important in HDT.Similarly, connectivity problems were only studied in HDT [3], and DT-enabled wireless networks [10], [11], [12], where the influence of statistical heterogeneity and synchronization accuracy on the connectivity cost was not considered.Sharing gradients as in federated averaging can lead to data leakage.As a result, cryptographic-based approaches have also been explored in some FL-based research [27] although such approaches are computationally inefficient in large-scale machine-learning models.Recently, differential privacy (DP) is being explored in FL to reduce the possibility of information leakages by hiding the contribution of each client during training thereby ensuring privacy guarantees.

C. DP Solutions for FL
DP solutions are efficient techniques that can provide privacy guarantees in machine learning [28].It is therefore unsurprising that such approaches have recently been attracting a lot of interest in FL-based research.By adding artificial noise to the learned model parameters or datasets, DP can protect nodes' privacy with limited computation.It, however, tends to reduce the overall accuracy as the privacy protection level increases.Thus, a trade-off exists between accuracy and privacy.In [16], a differentially private FL framework was adopted to prevent privacy leakages during data sharing to model the contribution, computation, communication, and privacy costs of each participant.Also, security and privacy concerns in the standard FL continue to hinder its wide adoption in urban applications, hence a differentially private asynchronous FL scheme was proposed in [29] for resource sharing in vehicular networks by integrating DP and FL techniques.
While a differentially private FL framework can ensure privacy guarantees when adopted, the inherent issues of statistical heterogeneity are the main concern.Hence, any differentially private FL-based framework can suffer from accuracy degradation when used in a network with non-independent and identically distributed (non-iid) data.To address these issues associated with differentially private FL, especially in HDT, where data from PTs are arbitrarily heterogeneous with fundamental statistical heterogeneity issues, while also noting the presence of many external conditions that can influence the behaviour of each PT such as environment and genetic information, we propose DPFML framework following the privacy-aware multi-task learning approach, first discussed in [15].This ensures a federated optimization of heterogeneous tasks while protecting the local model gradient information using DP.
A DPFML-enhanced framework can enable federated optimization of heterogeneous client tasks while protecting the local model gradient information through DP.Such a technique can prevent privacy leakage, and ensure accuracy, privacy and reduced communication costs when properly adopted.Since in HDT networks, many external factors may influence the performance of the entire system, while some unique structures may exist among different people [30], it is desirable to simultaneously achieve learning models for multiple related tasks through an efficient multi-task learning framework.The proposed DPFML-enabled HDT connectivity solution can learn customized context-aware policies from multiple users and environments in a privacy-preserving manner.The definitions of some common notations used throughout this paper are summarized in Table I.

III. SYSTEM MODEL
We consider a DPFML-enabled HDT system, where the physical and the virtual environments are connected through a blockchain and FML-enabled connectivity scheme as shown in Fig. 1.Data are generated by each PT (based on its update scheduling rate), located in the physical environment, to maintain a reasonable synchronization with its counterpart VT, located in the virtual environment.The physical environment is equipped with multiple local aggregators (LAs) corresponding to different entities of the physical environment, such as the typical PT, genetic information, environmental factors, etc.Each LA is connected with various sensing devices

TABLE I COMMON NOTATIONS USED
for data capturing and produces both shared and task-specific parameters, from its locally trained model and based on its local data.The shared parameter is then offloaded to the global aggregator (GA) for aggregation following the standard FL during the first phase.At the beginning of the second phase, each LA will forward its locally trained task-specific model to the validators, located in the virtual environment, to evaluate its quality, before the VT model updating and evolution in the final phase.The main components of the system model are summarized as follows.
• Environment: The physical environment depicts the realworld space, where every PT interacts with other entities within its environment and maintains a dependent relationship with such related entities.A virtual replica of each PT is maintained in the virtual environment.
• Local Aggregator: Also called client, each LA represents a distinct entity in the physical environment.Each of these clients consists of several sensing devices to regularly collect data from the physical environment.The collected data by each LA are aggregated, subject to its update scheduling rate o i , and are used to support learning during the FML.In the proposed HDT framework, each LA is responsible for updating the corresponding aspect of its associated VT, using the task-specific models, after the training quality requirements have been satisfied.

• Global aggregator:
The GA is a central server that facilitates aggregation of shared parameters during FML, and also provides the training requirement thresholds to validators during validation.At every communication round, the global model (which contains the aggregated shared parameters) is used to locally train each task-specific model at each LA.• Validators: Validators are essential components of the blockchain system that ensures the reliability of every update from each client before triggering the model evolution process in the virtual environment.Without this, the system cannot guarantee accurate and authorized model evolution of any corresponding VT.The blockchain also keeps the records of previous model evolution activities to ensure traceability.Since each LA is responsible for updating its associated VT, it becomes imperative to have an independent validation process.• Model evolution: Model evolution is an essential process in HDT.It involves the process of updating any typical VT based on the current state of its counterpart PT and relies on a timely, reliable, secure and privacypreserving PT-VT connectivity.At any time, the VT in the proposed DPFML-enabled HDT system is updated in the virtual environment using the task-specific parameters received from its counterpart physical pair.Since this model evolution process ensures that each VT is a true replica of its paired PT, it is an important part of any HDT framework and its construction is beyond the scope of this current paper.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Assume that there are M LAs, each with a learning task L i , ∀i ∈ {1, 2, . . ., M } as in Fig. 1.Each LA contains a training dataset D i = ∪ Di j=1 {(x i,j , y i,j )}, generated from different sensing devices as in [3], where D i is the data size, x i,j is the data of size j collected by client i and y i,j is the label of x i,j .In practice, the general aim of the HDT framework is to maintain the VT of each physical entity (e.g., body organ, habit, eating pattern, etc.) in the virtual environment while capturing the dependence of such a physical entity on other related components observed in the physical environment.As a result, LAs in the HDT framework are related to each other (e.g., a dependence exists between any PT, and its environment and genetic information) such that locally trained models from different LAs share some common underlying representation.To capture this in the system modelling, the hard parameter sharing technique [31] can be incorporated into the standard FL to obtain FML.This hard parameter-sharing technique has been earlier adopted in neural networks.With this, the shared feature representation can be learned through joint optimization of different tasks via the parameter sharing in the proposed DPFML framework.
While model sharing among various LAs can reduce the effect of insufficient data and improve the overall system accuracy, it also comes with a risk of privacy leakages and statistical heterogeneity.The adoption of the DP technique ensures that privacy leakages are prevented through applications of Gaussian noises at each LAs.This, however, comes at the expense of accuracy.Thus, we incorporate a double-layer multi-task learning technique [15], where shared parameters are used to improve the training performance at each LA, while task-specific parameters are used to achieve personalization.

A. Federated Multi-Task Learning Model
To properly capture statistical heterogeneity, each LA learns a domain classifier to capture transferable feature representations (i.e., shared parameters) across tasks through the hard parameter-sharing technique.The transferable feature representations are offloaded to the GA every communication round for aggregation.After aggregation, the GA forwards the global model to all related LAs.Each LA uses this global model to improve the training of its task-specific models.The aim is to reinforce each task by taking advantage of the interconnections among related tasks while considering both the inter-task relevance and the inter-task difference.Each LA through its domain classifier classifies every feature as either a sharable or task-specific feature by minimizing the distribution difference [32] between its shared and global parameters.This distribution difference can be obtained following the maximum mean discrepancy [33] as where n i and n GA are the number of samples drawn from any LA i and GA, respectively and Φ(.) is the nonlinear mapping into reproducing kernel Hilbert space H. Also, x k i , ∀k = {1, . . ., n i } and x l M +1 , ∀l = {1, . . ., n GA } are the feature vectors of LA i and GA, respectively, while ||.|| represents the norm.For each feature x k i in D i , the chances that x k i is a sharable feature can be obtained through the instance weight where D shared is a vector containing the sharable parameters.
If we assume a logistic regression-based domain classifier, then the mapping Φ(i, GA) maps the global feature vector to the local feature vector of LA i, while Φ(GA, i) maps the local feature vector of LA i to the global feature vector, such that min where diag(.)transforms an input vector to a diagonal matrix, ||.|| F is the Frobenius norm and λ > 0 is the regularization parameter.At any round, features with less difference to the global parameter are classified as shared parameters and features with more difference are classified as task-specific parameters.Let the top layers of the double-layer multitask learning framework, as presented in Fig. 2, capture the task-specific features T f , while the lower ones capture the shared features S f .Then, we can define T f and S f as where ω i and ω M +1,i represent the task-specific parameters and the shared parameters of any client i, respectively.As shown in Fig. 2, the local optimization is carried out by each client subject to its local objective function f i , given as where ℓ is the loss function and ω M +1 is the global model.During the optimization process, ω M +1 is shared among all clients, while the GA carries out the aggregation of local models at each communication round and then distributes it back to all clients after updating the global model.With this, each task aims to learn a function f i , while the global objective is obtained as where Di captures the weight of the local model for each LA i ∈ {1, 2, . . ., M }.
Let all participants (i.e., LAs and the GA) be honest but curious.As a result, it is possible that any participant may maliciously attempt to infer some vital information when interacting with other participants.To prevent such a potential Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.privacy violation scenario, randomized noises such as Gaussian noise is introduced at the gradient level following the DP approach.With that, the gradient contribution of each LAs during communication and aggregation can be protected.
The proposed DPFML framework is summarized in Algorithm 1, where each of the M LAs begins the FL process when at least any LA i ∈ {1, 2, . . ., M } has a status to update following its update scheduling rate o i , subject to the approval of the GA.As soon as the GA approves the commencement of any status update, each related LA receives the global model ω t M +1 from the GA and updates its locally trained model by computing the gradient for both shared and task-specific layers.Note that the Gaussian noise n M +1,i ∼ N (0, σ 2 G 2 S ) is only added to the shared models since only the lower layers are shared to capture transferable feature representation.Thus, we reduce the effect of noise on the overall accuracy of the model.The convergence of Algorithm 1 is demonstrated through the simulation presented in Section VI.
DPFML framework ensures privacy guarantees without sacrificing much of the trained model accuracy.Such a framework aims to prevent any attacker or eavesdropper [34] from extracting sensitive information during model exchanges among LAs and the GA.It relies on a common standard method when measuring privacy risk, called (ϵ, δ) − DP, where ϵ > 0 is the privacy budget and δ ∈ (0, 1) is the additive term.Hence, the possibility that ϵ − differential privacy is violated is captured by probability δ.A lower ϵ suggests that the clients have a lower risk of privacy leakage.With DPFML, each client adds artificial Gaussian noise during local training at every round such that (ϵ, δ)−DP of its local datasets is always guaranteed.To ensure accuracy and convergence, the GA determines the privacy budget ϵ min ≤ ϵ ≤ ϵ max .When ϵ < ϵ min , the added noise is too large thus the training cannot converge.Similarly, when ϵ > ϵ max , the added noise is too small and privacy cannot be protected.The GA may also specify the training data size N i , and the corresponding reward R i to facilitate fairness during the validation process.
Note that any global model at each round includes aggregated uploaded noisy local models.At every communication = ω t M +1,i − η(g t M +1,i ) Compute gradients for task-specific layers as g t i = ∂ω i fi(ω t ) Update parameters for task-specific layers as End For End For Each LA offloads model weight ω t+1 M +1,i to the GA The GA aggregates the received weights as where P r(ω generally called the neighbouring model parameters [28], such that the sensitivity function can be defined as This ∆ f depicts the maximum value by which any local model function f changes if noise is added to ω M +1,i and it captures the similarity between any neighbouring model parameters ω M +1,i and ω ′ M +1,i .

B. Blockchain-Enabled Validation Model
HDT relies on accurate modelling of the VT to guarantee performance.However, some LAs may attempt to manipulate the system either by providing an untrained model or misleading data for model updating in the virtual environment.Similarly, a multidimensional information asymmetry [16] may also exist among participants, where selfish participants manipulate their costs to receive more rewards from the system.To ensure that the final model from each LA is accurate and has not been modified through malicious activities, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
we propose a PoMQ consensus mechanism which offers the validation process in terms of the quality of the trained model during FML rather than solving computational-inefficient hashing puzzles, as in the proof of work.With the PoMQ, the validation process can be carried out before model updating and evolution in the virtual environment.The PoMQ protocol is made up of V validators located in the virtual environment.Multiple validators are necessary to eliminate the possibility of malicious validation.These validators are responsible for validating the training quality of each model.After validation, each validator broadcasts its validation decision to other validators to reach a consensus.A virtual model is updated using any learned model only if the majority of V consent.
The proposed PoMQ evaluates each learned model based on computation cost, communication cost and privacy cost.With this, validators can investigate whether the expenditure by each LA corresponds to the expected costs.The VT is, therefore, only updated when the consensus decision signifies conformity to the requirements.More details on the analysis of the validation process are provided in the next section.

IV. PERFORMANCE ANALYSIS
In this section, we first analyze the proposed framework by investigating the physical-virtual environment connectivity cost from time, privacy and energy perspectives.We then obtain analysis for synchronization accuracy before presenting the resulting optimization problem in Section V.

A. DPFML Model
To ensure accurate VT model updating and evolution, data captured from its environment and other individuals with similar behaviour are also used through multi-task learning to improve the performance while boosting the effective sample size for each LA.At every round, each LA carries out local training following (6) and subsequently performs shared and task-specific features classification.The shared model is then offloaded to the GA for aggregation.It is worth noting that the cost of achieving a secure and privacy-preserving connectivity scheme may undermine its benefits.Thus, we aim to minimize the synchronization cost while ensuring accuracy and reliability.Compared to the local training time, the features extractions time at each LA is negligible.Thus, we focus on the local training time to estimate the required time, privacy and energy cost when updating any typical VT.
Let c r and c i represent the number of CPUs required to train one sample of training data and the CPU cycle frequency of any LA i, respectively.The time cost for local model training over N R rounds can be derived as and the corresponding energy cost can be derived as where κ i is the capacitance coefficient depending on the chip architecture and c 0 captures the number of floating operations required to train or compute each sample for N E .Similarly, for global aggregation, the total time and energy costs can be respectively approximated as where c agg is the number of CPUs required to aggregate one unit of data and c GA is the CPU cycle frequency of the GA.
During each round, each LA perturbs its shared model parameters through the DP technique.If we assumed Renyi DP [16], then the injected noise σ i to achieve (ϵ, δ) − DP guarantee for each LA after N R -round training can be computed as where |L B | is the size of the local mini-batch and α = ([2 log(1/δ)]/ϵ) + 1 given that From ( 13), it is clear that σ i depends on ϵ, while ϵ is inversely proportional to privacy protection.The strictest privacy is achieved when ϵ = 0.Under this, it is impossible to differentiate any two locally trained models.Let the privacy cost be defined as the economical loss due to the potential privacy leakage which can be formulated as where v i represents this economical loss per unit shared model from privacy leakage.

B. Communication and Validation Model
During FML, each LA offloads its shared model to the GA at the end of every round for aggregation.The total offloading time cost then can be calculated as where r i is the data rate between any LA i and the GA, and can be obtained during round t as The parameter N is the thermal noise signal power, B 0 is the bandwidth, P i depicts the offloading power of any LA i and h i,GA is the channel gain between any LA i and the GA.
From the GA to all LAs, the transmission time is assumed to be negligible due to ample resources available in the GA [12].Hence, the focus is only on offloading.Similarly, the total Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
energy cost during offloading of shared model parameters is given as where t i is the time allocated to each LA.
After N R -round training, any learned model f i (ω i ) will be further validated by the group of validators following the PoMQ consensus protocol to determine whether the received model satisfies the pre-defined requirements.Given that the total communication cost, total training cost and total privacy cost incurred during the training of such a learned model, as received from the GA, are given as C off tot , C cmp tot and C pvy tot respectively.Then each validator m j ∈ [V ] rates such a learned model f i (ω i ) as where the thresholds of offloading, computation and privacy costs are respectively given as θ off , θ cmp and θ pvy .A learned model will be ultimately applied to the corresponding VT if To estimate the validation cost, we consider the transmission of the final learned model to the virtual environment for the validation, the computation process at each validator and the decision exchange among validators after validation.The total validation time cost of any model f i (ω i ) is given as where |ω f,i | is the size of the final model f i (ω i ) after N R communication rounds, c v is the number of CPUs required to validate one sample of the final learned model, c mi is the validation capacity, |R mj (i)| is the size of the decision message and r v is the data rate among validators, assumed to be constant owing to the pre-defined communication subchannels among validators.Similarly, the energy cost can be obtained as

C. Connectivity Cost
Connectivity cost captures the cost of updating the VT model every time an update is scheduled.It captures the cost of maintaining secure and reliable synchronizations between any PT and its counterpart VT.Since M LAs participate in a model update, the connectivity cost includes the cost of FML at each participating node, the validation cost, the privacy cost as well as communication cost.Given any typical physical-virtual twin pair, the overall time cost to complete a single status update can be obtained as Similarly, the overall energy cost to ensure synchronization of any single model update is obtained as Generally, the connectivity cost depends on the scheduling rate o i .Given that U is the number of status updates that have been scheduled over a known time interval, its probability mass function can be expressed as The long-term average connectivity cost per any arbitrary VT model update can therefore be obtained as Note that to update any arbitrary VT, the privacy, time and energy costs should be considered.Hence, ( 26) is obtained by averaging ( 15), ( 23) and ( 24) over U.

D. Synchronization Accuracy
It is essential to investigate the synchronization accuracy.Unfortunately, such a metric is very difficult to define since it captures the degree of similarity between any PT and its counterpart VT at any time.If we consider that any final model after the validation process could capture the true corresponding update in the physical environment, then we may evaluate synchronization accuracy as a function of the synchronization time [35] and the FML loss.That is, lower synchronization time and FML loss depict higher synchronization accuracy.For this purpose, we define a new term called synchronization gap, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
which is the time since the last status update was generated in the physical environment.With this, the synchronization gap, at any known FML loss, is inversely proportional to accuracy.
To obtain the synchronization gap, we first consider each generated status update u to intuitively pass through the FML learning and validation process following the first-come-firstserve (FCFS) approach.We further assume that this status update generation is the same as the status update arrival while the arrival and the service time (i.e., the time from arrival till model evolution or updating) follow random processes.The service time of any status depicts the time at which the VT is successfully updated using such a status update.Define that the inter-arrival time of status updates X (u) from any tagged LA is an independent and identically distributed (i.i.d.) exponential random variable with E[X] = 1 oi [36], [37].Let ϖ k represents the times at which status is received at the tagged VT for its update, then at any time t, the index and the timestamp of the most recently received status are respectively given as The synchronization gap at time t can then be expressed as In the absence of newly received model updates, the synchronization gap increases linearly with time and is reduced to a smaller value when a new model update is received.For any update u, the processing time of any generated status update can be obtained as time is the waiting time for any status update u and C (u) time is the service time of u following (23).Obviously, W (u) time = 0 if any status update u is generated when the VT is already updated with previously generated status u − 1.However, if u is generated when u − 1 is still in the system (i.e., u − 1 has not triggered an update of the VT), W (u) time = (P (u−1) time − X (u) ) + captures the average waiting time.For explanation purpose, we assumed that o i follows the Poisson point process, while the service times of the rate ϱ i are similarly i.i.d.exponentials with average C (u) time , then the average synchronization gap is provided in Proposition 1.
Proposition 1: The average synchronization gap when the status arrival rate is Poisson while the service times are i.i.d.exponentials can be calculated as Proof: The proof follows from the probability density function of the system with Poisson process arrival rate and exponential service times given as Note that (30) follows from a single-server FCFS queue with infinite buffer size.Such a scheme, though simple, may not be suitable in the HDT system, where any VT is expected to reflect the latest status of its counterpart PT.Instead of processing an old status, we can discard it and simply process the latest status.Following this and to obtain a low synchronization gap, we introduced a non-preemptive singleserver last-come-first-serve (LCFS) queue with a buffer of size 2 and queue displacement policy, where the system at any time can only consist of a maximum of two status updatesone currently under processing and the other on the queue.On the arrival of another status update, the newly arrived status displaces the status waiting in the queue as shown in Fig. 3.
Let the arrival and service follow Poisson and general distributions, respectively.We applied the classical embedding technique under the assumption that the queue system is stationary and sampled at certain epochs, such that service completion (i.e., the successful updates of a VT) has a Markovian property.With that, the synchronization gap can have the same distribution for all t while the distribution of any C (u) time of rate ϱ i is given as g with Proposition 2: If C time is i.i.d.exponential in steady-state, the density of S gap at any time t can be obtained under a nonpreemptive single-server LCFS queue system with buffer size 2 and queue displacement as where ρ = oi ϱi depicts the traffic intensity.Proof: The proof can be obtained by inverting the Laplace transform of S gap (t) at any time t = 0, given as [38] and [39] E[e −sSgap(0) ] where for and lim oi→∞ D lcfs Sgap (t) = t exp(−t).The synchronization gap in such a case can be expressed as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Fig. 3.A 2 buffer size non-preemptive single-server LCFS queue with queue displacement policy.

V. PROBLEM FORMULATION AND OPTIMIZATION
Asides from the synchronization gap, it is also important to reduce the occurrence of loss during the FML.This is expected to improve the synchronization accuracy at any time by simultaneously minimizing the two functions following The parameters ω 1 and ω 2 are weight factors that ensure an effective combination of the synchronization time and the FML loss.The constraint in (37a) ensures the values for o i while (37b) ensures that the two weights sum up to 1.
In addition, (37c) and (37d) ensure that the total number of participating LAs and validators do not exceed M and V , respectively.A simultaneous minimization of the synchronization gap and loss in (37) is not straightforward since the synchronization gap depends on not only the overall time cost, but also the FML.We can thus leverage the overall time cost to further reduce the synchronization gap in (36).From (37), we know that an increase in the number of rounds N R can improve the training accuracy at the expense of connectivity cost, while an increase in the privacy protection level (i.e., a lower ϵ) improves privacy, but decreases the accuracy.Moreover, if we increase the computation overhead (by increasing the added noise), we can have higher privacy but lower connectivity cost.That means a trade-off exists between accuracy and connectivity cost, accuracy and privacy, and connectivity cost and privacy.In this section, we attempt to minimize the connectivity cost of the proposed DPFML-enabled HDT framework without compromising accuracy and privacy.

A. Problem Formulation
We aim to find the balance between synchronization accuracy, privacy cost and connectivity cost.The objective function can be formulated as where Θ 1 (0 < Θ 1 < 1) represents the weight factor necessary to combine two objective functions and Θ 2 captures the mapping factor to ensure that the two objectives functions are at the same scale.Note that to maximize the synchronization accuracy, we simply minimize the FML loss in (38), and take (36) as a baseline synchronization gap which can be further reduced through the time cost C time .That is, given (36), the gap can be further reduced by minimizing C time .Thus, the optimization problem is obtained as where c min i and c max i represent the minimum and maximum computation capacity of each LA respectively, while c min mj and c max mj are the minimum and maximum validation capacities of each validator respectively.The constraint (39d) ensures that the privacy budget is within the acceptable range while (39e) ensures that the CPU frequency of any LA i is within the acceptable range.Similarly, (39f) ensures that the validation capacity of any validator m j is also within the acceptable range.Obviously, problem (39) is a nonconvex optimization problem and thus its solution is difficult to obtain in close form.In what follows, we transform the original problem into an MDP problem and provide solutions using the DRL algorithm.

B. MDP Problem and Solution
We defined the tuple (S (t) , A (t) , R (t) ), where S (t) , A (t) and R (t) are the state space, action space and reward, respectively, at each round t.In the proposed framework, the agent is the typical LA that aims to update its counterpart VT by collaborating with related LAs during the FML.Thus, the state space includes the achievable data rate r, the computation capacity c, the validation capacity c m , the learned parameter ω(t) and the global loss value function f (ω M +1 ).The state space is given as Similarly, the action space includes the scheduling rate o, the number of validators V , the number of LAs M , the privacy Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
budget ϵ and the number of rounds N R .The action space is given as while the reward function is obtained from (38) as if all constraints in (39) are satisfied and zero if otherwise.
Given that γ ∈ [0, 1] is the discount factor, each agent aims to maximize the cumulative reward

C. DRL Solution Using DDPG
To solve the MDP problem, we adopted the DDPG algorithm [40] owing to its ability to achieve improved performance, for continuous action space, compared to other algorithms.While Q-learning algorithms and their variants are suitable for low-dimensional, discrete state and action spaces, most of these algorithms do not easily converge to optimal behaviour.In addition, such models are susceptible to the curse of dimensionality when too many decision variables are involved [41].The Deep Q network algorithm can handle problems in high-dimensional continuous state space, albeit under a discrete action space.DDPG algorithm generally relies on deep neural networks to create two approximation functions of the actor-critic algorithm.The actor network is described as a policy function µ(S|θ µ ) with parameter θ µ while the critic network is described as an action-value function O(S, A|θ O ) with parameter θ O .
Define the Bellman equation as where the loss of O(S, A|θ O ) can be obtained, if µ ′ represents the target actor network, as To update the policy function µ(S|θ µ ), the chain rule is applied to the Bellman equation from the start distribution J [40] based on the actor parameters To minimize loss, the parameter ▽ θ O L is estimated using the algorithmic differentiation technique [42], such that the action-value function O(S, A|θ O ) parameters are updated using the gradient descent as given that η critic rate is the learning rate of the critic network.In addition, the algorithmic differentiation technique is also adopted to obtain gradients where N batch is the mini-batch size selected by the agent during the learning process.Similarly, where η actor rate is the learning rate of the actor network.With these, the target critic and actor networks can be respectively obtained using the update rate τ rate ≪ 1 as This work used multiple DDPG agents to simulate the proposed framework.Its general structure is shown in Fig. 4, where D1, D2, A1, S1, S2 and C1 capture the hidden layers of such a framework.The details of the simulation and results are provided in the Section VI.

VI. NUMERICAL RESULTS
We first implemented the proposed DPFML framework by adopting TensorFlow and LEAF library [43] -an open-source library that provides a modular benchmarking framework for federated settings with applications including federated learning, multi-task learning, meta-learning, and on-device learning.Since HDT data are expected to be non-iid, we used the CelebA datasets available in the LEAF library.To minimize the required connectivity cost in the proposed framework, we incorporated the DDPG algorithm into the DPFML framework.We compared the proposed framework with three other baseline frameworks: the conventional federated averaging (FeDAvg), the FeDAvg with DP (DPFedAvg) and the DPFML with standard validation method (vDPFML).The vDPFML is simply an implementation of the proposed framework with a standard blockchain consensus algorithm instead of the proposed PoMQ consensus mechanism.
We carried out several experiments and simulations to demonstrate the performance of the proposed framework.In the simulations, we used a computer system with 10 CPU cores.The CPU is Intel(R) Core(TM) i9 − 10900X with 3.70 GHz.Generally, to complete any arbitrary VT model update, we carried out the local training, local parameters synchronization and communication rounds in the regions [0, N E ], [1, M ] and [1, N R ], respectively.Therefore, the complexity of Algorithm 1 is around O(N E M N R ).Except otherwise stated, the parameters used for simulations are presented in Table II.These parameters were selected based on similar works [3], [11], [12].We set the sizes of the hidden layers in Fig. 4 as follows: D1 = 128, D2 = 32, S1 = 64, S2 = 32, A1 = 32, and C1 = 16.
In Fig. 5, we demonstrate the ability of the proposed DPFML scheme to reach convergence faster than other schemes.The highest loss is observed in the DPFedAvg  scheme due to the incorporation of the privacy budget (as in the proposed DPFML and vDPFML schemes), through the addition of the Gaussian noise.The DPFML scheme achieved the least loss as N R increases which justifies the suitability of the proposed scheme to perform efficiently in the presence of non-iid data.Although the FeDAvg has been demonstrated to perform well when iid datasets are used, we claim that data is HDT systems are expected to be non-iid.Thus, such a framework may not be suitable.
A similar result is observed in Fig. 6, where the loss is obtained as ϵ increases.At a lower ϵ, the loss is higher since a large amount of noise is added.As ϵ increases, the loss is observed to reduce although remains constant for DPFeDAVg as ϵ increases beyond 35.In addition, the standard FeDAvg remains fixed since privacy is not considered, thus the privacy budget was set to the maximum under such a case, i.e., ϵ = ϵ max , while the DPFML scheme showed improved performance compared to the other schemes.The proposed  scheme ensures better performance without compromising privacy.
Next, we investigate the performance of the DPFML framework using accuracy as a metric.While the accuracy in FeDAvg increases with N R as shown in Fig. 7, the DPFedAvg continues to produce an accuracy closer to zero even as N R increases.Conversely, the DPFML achieved the best accuracy within the first 100 rounds and remain almost constant afterwards.This confirms the ability of the DPFML framework to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.reach convergence with a limited number of communication rounds.Likewise, the proposed DPFML scheme produces an improved performance compared to the FeDAvg as ϵ increases.This is depicted in Fig. 8.Although the standard FeDAvg produced a better accuracy when ϵ is closer to zero, this is unsurprising since such an approach provides no privacy.With DPFeDAvg, the accuracy was observed to be very low.This further confirms that FeDAvg is unsuitable for HDT frameworks where privacy is an important constraint since such an approach underestimates privacy.
To compare the performance in terms of the average connectivity cost, we investigate the long-term average connectivity cost (U = 100) as a function of c i .As presented in Fig. 9, the long-term average connectivity cost decreases as c i increases in all cases since an improved computation capacity can  reduce latency significantly thereby reducing the time cost.However, the DPFML scheme requires the lowest cost while the FeDAvG and DPFeDAvg require more cost to ensure the timely synchronization of any PT-VT pair.A similar result is obtained when long-term average connectivity cost was investigated as c mi increases in Fig. 10, confirming that the standard validation process in blockchain may not be suitable in HDT systems because they require more time and energy to validate every transaction.In Fig. 11, we also confirmed that the connectivity cost indeed increases with ϵ.Interestingly, the cost is highest when no privacy or less privacy constraint is implemented reflecting the level of potential threats.
We note that the time cost in the DPFeDAvg is slightly higher than the conventional FeDAvg since the introduction of noise means more communication round is required to reach convergence than in its standard version.As shown in Fig. 12, the time cost increases with the V since more cost is incurred to reach consensus as V increases.With a minimum time cost for the DPFML, the synchronization gap is further reduced between any PT-VT pair.Interestingly, the energy cost increases in Fig. 13 as c i increases since the agents have learnt the optimal parameters to ensure reduced cost and an increase in capacity can only increase the energy cost but cannot improve significantly the overall performance.

VII. CONCLUSION
HDT is a new technology that can transform many aspects of our current environment.To realize any HDT system, there Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.must be reliable connectivity between any PT-VT pair to ensure timely synchronization between them.In this paper, we investigate the connectivity problem in the HDT framework and proposed the DPFML technique to achieve connectivity between any PT-VT pair.This is necessary since connectivity costs must be reduced to ensure timely synchronization without compromising privacy.To further reduce cost, we proposed a new consensus protocol, called the PoMQ, and formulated the connectivity problem as an MDP problem to allow optimization through the DDPG algorithm.We compared the proposed solution with the existing ones and conclude that the proposed scheme is suitable for HDT systems where datasets are expected to be non-iid while privacy and security are also essential parameters.

Fig. 5 .
Fig. 5. Performance in terms of the learning loss with respect to N R .

Fig. 6 .
Fig. 6.Performance in terms of the learning loss with respect to ϵ.

Fig. 7 .Fig. 8 .
Fig. 7. Performance in terms of the learning accuracy with respect to N R .

Fig. 9 .
Fig. 9. Performance in terms of the average connectivity cost.

Fig. 10 .
Fig.10.Impact of validation capacity on the average connectivity cost.

Fig. 12 .
Fig. 12. Performance in terms of the time cost.

Fig. 13 .
Fig. 13.Performance in terms of the energy cost.