The Interplay of AI and Digital Twin: Bridging the Gap between Data-Driven and Model-Driven Approaches

The evolution of network virtualization and native artificial intelligence (AI) paradigms have conceptualized the vision of future wireless networks as a comprehensive entity operating in whole over a digital platform with smart interaction with the physical domain, paving the way for the blooming of the Digital Twin (DT) concept. The recent interest in the DT networks is fueled by the emergence of novel wireless technologies and use-cases that exacerbate the level of complexity to orchestrate the network and to manage its resources. Driven by AI, the key principle of the DT is to create a virtual twin for the physical entities and network dynamics where the virtual twin will be leveraged to generate synthetic data and offer an on-demand platform for AI model training. Despite the common understanding that AI is the seed for DT, we anticipate that the DT and AI will be enablers for each other in a way that overcomes their limitations and complements each others' benefits. In this article, we dig into the fundamentals of DT, where we reveal the role of DT in unifying model- and data-driven approaches, and explore the opportunities offered by DT in order to achieve the optimistic vision of 6G networks. We further unfold the essential role of the theoretical underpinnings in unlocking further opportunities by AI, and hence, we unveil their pivotal impact on the realization of reliable, efficient, and low-latency DT.


I. INTRODUCTION
Over the last couple of decades, the paradigm of virtualization has been evolving from a virtualized local area network (LAN) and private networks to the solidification of network function virtualization (NFV) and network slicing principles.This advancement is driven by the edge computing and cloudification capabilities of current wireless network generations.With the growing demands of wireless networks, in terms of latency, reliability, and energy and spectral efficiency, and the emergence of sophisticated services with heavy distributed computing requirements, it is envisaged that the concept of network virtualization will be scaling up from the node and link levels to the network-wide level, setting the scene for a holistic network virtualization, from the core to the edge.Coupled with the pervasive utilization of artificial intelligence (AI) at all network levels, the Digital Twin (DT) paradigm has been recently deemed as a promising tool for network design, optimization, management, and recovery, in which the DT can be leveraged to realize the vision of zero-touch 6G networks [1].The key principle of DT networks relies on developing an accurate digital replica of a wireless network, taking into account the environmental and physical elements, the network parameters, and the dynamics and interactions happening at the node level.The DT paradigm aims at facilitating the optimization and control of wireless networks when implemented at a large-scale.This is motivated by the exacerbated complexity and coordination difficulty of future sixth generation (6G) networks, which are characterized by the emergence of a swarm of novel applications and technologies, with extreme requirements.Within this context, DT is anticipated to offer a digital platform for network configuration and optimization purposes, with the AI being the main orchestrator [2].Generally speaking, assuming a perfect network and environment virtualization, DT can be exploited as a tool for engineered data generation, where data can be collected from common and rare network scenarios, artificially created at the cyber twin (CT).This data will be then leveraged by various AI algorithms in order to perform models training, and then achieve efficient inference and decision-making process.According to its role, DT can be classified into planning, training, and operational twin.The planning twin is utilized at the initial stages to ensure an optimal design of the network assets and components.This means that the CT is created before the physical twin (PT) in this scenario.Meanwhile, the training twin offers a platform for AI models to be trained at the CT before implementing them at the PT.In this scenario, the computational overhead is moved from the physical environment to the CT, and hence, the resources of the physical devices will be saved, and accordingly, the PT will be responsible for updating the models in the case of any network variations.Lastly, the operational twin constitutes a network brain, which purpose is to perform data generation, models training based on the generated data at the CT and real data collected from the PT, and more importantly, perform ondemand decision-making and inference, as well as AI models retraining once needed.
Albeit the prevailing belief that AI will be the enabler of the DT paradigm [3], it is worthy to divulge whether the contrary is true.Whilst DT requires the employment of AI algorithms in order to grasp insights from the available data, and hence perform intelligent inference, we envision that the DT can potentially contribute to the enhancement of several AI algorithms from different perspectives, including the availability of highly reliable data with conditioned distributions, in addition to offering a virtualized digital platform for a reduced complexity at the physical environment.While the former help improving the accuracy of the trained models, the latter can further speed up the training process.Within the same context, recalling that in the current conventional frameworks, due to the lack of high quality data, model-driven approaches are exploited to compensate for the shortage in reliable data and to assist in models training [4].Although such an approach can temporarily provide a reasonable accuracy, it is unamenable to be scaled up to a large scale, and it lacks the adaptivity to network dynamics.Therefore, we foreseen that the DT paradigm will be the link that will provision the synergy of model-driven and data-driven approaches as a unified tool, bringing in their advantages and overcoming their limitations.
In this article, for the first time in the literature, we explore the interplay of AI and DT, and delve into the interrelated effect of AI and DT on each other and how each contributes to the realization of the other paradigm (Fig. 1).Furthermore, we reveal the integral role of model-driven approaches in enabling robust DT network, and the intertwined benefits that can be reaped when integrating model-driven and data-driven tools into unified approaches.Also, we shed lights on how the theoretical foundations paves the way for optimized DT and for the comprehension of the hidden logic behind most AI algorithms.

A. Twin-Twin coordination
Albeit the bright vision of realizing a holistic representation of the physical environment and network elements over a unified digital platform, the real implementation of DT can be achieved through multiple interconnected twins.While latency and complexity will be reduced within each twin, such a distributed DT implementation introduces a new level of complexity pertinent to inter-twin coordination, where models trained over the multiple twins should be aligned to ensure accurate global inference, i.e., operations over multiple twins should be synchronized to achieve joint tasks.In this process, reliability and latency should be maintained within the required thresholds.In this context, AI algorithms can be leveraged for improved twin-twin coordination.From one perspective, AI can be leveraged in order to realize minimized end-to-end (E2E) latency.In specific, graph neural network (GNN) constitutes a potential candidate for facilitating intertwin communication and coordination.This is motivated by the fact that wireless networks modeled through multiple DTs can be represented as graphs, incorporating the globalcontext of twins.GNN exploits the graph structure of wirelessenabled DTs in order to capture the nodes dependencies in inter-and intra-twin communications [5].Through the virtual twin, GNNs can develop comprehensive insights into twins and their interactions by leveraging the dynamic nature of networks as state features, and then aggregate these states to achieve a comprehensive network understanding.Recent results demonstrated the superiority of GNN in predicting the network E2E delay [6].This metric can be of importance to accurately measure the delay encountered at the CT-CT and CT-PT communications, and compensate for that delay.From another perspective, GNNs can be exploited for improved data communication and models synchronization among multiple CTs, where the inter-twin links can be optimized to ensure synchronized twin-twin operations, and ultimately, obtain a homogeneous global DT.Additionally, goal-oriented semantic communication has two advantages in twin-twin coordination, namely, i) the proper design of the network goals to achieve the required latency, reliability, and synchronization, and ii) the cooperative operations of multiple twin to fulfill joint network goals, yielding coordinated twins at the virtual realm [7].

B. Synthetic data generation
As discussed, one of the attractive benefits of the DT is that it constitutes a source for close-to-real data generation, in order to compensate for the weakly measured datasets at edge devices, which are diverse in terms of quality and quantity.Nevertheless, generating all-inclusive dataset that accounts for all nodes status, various environmental events, and network scenarios requires significant time resources, and therefore, is generally performed in a multi-step process.Therefore, generative adversarial networks (GANs) play an important role, by employing a generator and a discriminator in order to generate accurate synthetic data.As demonstrated in Fig. 2, a generator is utilized at the CT for the purpose of generating synthetic data.Then, a discriminator is used in order to train the generator to produce higher-quality datasets, i.e., close to the data sensed from the PT.Accordingly, the real data acquired from the PT is considered as a benchmark to quantify the accuracy of the generated synthetic data from the generator.
Several variants of the GAN can introduce a different level of data generation options, and hence, offer an engineered data generation campaign, that is aligned with the data generation process at the DT.For example, conditional GAN, which impose particular constraints on the data generated by the GAN can be leveraged in order to generate datasets that are conditioned by a particular data distribution or different modality [8].Furthermore, time-series GAN can model timeseries data [9], and therefore, can assist with understanding the network dynamics over the CT.Note that GANs, and their variations, have a number of promising applications in wireless networks, e.g., channel estimation/modeling, modulation classification and recognition, and spectrum management.Accordingly, GANs represent an essential part in DT-enabled wireless networks, where synthetic data generated by GANs are expected to substantially enhance the performance of corresponding wireless networks.Within this regard, GANs at the cyber twin are anticipated to generate data with different modalities, according to the need of the considered scenario, and hence, images, RF data, etc, are potential outputs of GANs at the DT.It is worthy to note that data generated using GANs can be used to validate models trained over the DT data, and vice versa.

C. Twin generalization
As discussed earlier, due to the increased complexity associated with future wireless networks, distributed DTs might be the solution for enhanced reliability and reduced latency in the DT paradigm.While on-demand data sensing is one of the operational DT pillars, it might be challenging and time-consuming to perform models retraining as a response for any environmental or network variations.This is particularly pronounced in sudden and fast variations that are generally expected in vehicular networks.Accordingly, transfer learning

Digital Twin Artificial Intelligence
Twin-Twin Coordination Note that transfer learning can be integrated with Reinforcement Learning (RL) for improved CT design.RL can be exploited for improved knowledge transfer between multiple CTs, where the transferred model from a wellestablished CT can be exploited at policy-agnostic agents in another CT, which experiences low-quality dataset, hence, rewards and actions will rely on the transferred model, and thereby, the transferred models' weights can be fine-tuned to optimally fit the target twin.On the other hand, transfer learning can be used for policy transfer among multiple cyber twins, for fast convergence and agents training at the target CT.

III. DIGITAL TWIN FOR AI
While the successful operation of DT heavily relies on various AI algorithms, it is worthy to ask whether the opposite is true.Will the development of a holistic DT be the seed for more innovative AI architectures, that will serve the interests of future wireless networks?Furthermore, can DT be leveraged to enhance existing approaches, both modeldriven and data-driven?If so, how DT can be efficiently implemented in order to enable it to fully grasp the benefits of both approaches?Recalling that future wireless networks are characterized by their high level of complexity, it is indisputable that at some point current schemes will fail to deliver the needed performance and accuracy.This is due to the fact that existing network optimization, configuration, and design are either developed from a theoretical point of view, or built based on data collected from the network.While the former is efficient in initial network planning, it lacks the scalability and adaptivity offered by the latter.On the other hand, data-based models are insufficient to provide fullscale representation of wireless networks.In the following, we explore the opportunities offered by the DT in order to provide a unified platform for model-based and data-based schemes.

A. Experience-driven learning
As RL was initially developed as a step toward realizing autonomous systems, it constitutes a natural choice for DT applications.The merit of RL is manifested in training DTs, where a training DT can be employed for risk-controlled agents training, i.e., RL agents can freely interact with the CT, experiencing a wide range of common and rare scenarios.While maintaining the real environment unharmed is considered a big advantage, RL agents can further benefit from the DT for fast training purposes, where devices with supercomputing capabilities can ensure accurately trained agents at the CT within a short period of time.These benefits are further demonstrated through a case study by Ericsson, in which they developed a DT framework to minimize the transmission power, while maintaining a particular quality-of-service (QoS) requirement and a monitored level of radio frequency (RF) radiations [11].Allowing the RL agents to learn through direct interaction with the physical environment is considerably risky, particularly in areas with strict regulations on transmission power, and therefore, the DT offers a safe, yet efficient, virtual agents training.The optimum goal of such scenarios is to design multiple agents that are well-trained in a way that enables them to perform efficiently without further interactions with the physical environment, or to require few interactions with the real environment.

B. Data availability and Storage
Although the recent the advancement in sensing services has facilitated data measurements campaigns, for improved AI models training process, the envisioned native AI networks will necessitate on-demand node-level data collection.This is primarily aimed in order to enable pervasive intelligence and accurate inference regardless of the network status.Such a bright vision can only be achieved if data collected represents all possible network scenarios, taking into account the physical environment status.Therefore, the DT can be a game-changer in this situation, where rarely-experienced network scenarios can be artificially engineered at the CT in order to study the network behaviour under such circumstances, and hence, perform comprehensive data collection process, that is capable of representing all nodes activities under a wide-range of network scenarios.While this data is artificially generated, it is considered close-to-real data, given that it will be generated under realistic virtual environments, that accurately imitate the dynamics of real networks.Within the same context, recalling that future wireless networks are characterized by their high level of heterogeneity, it is highly probable that local datasets at edge devices are non-identically distributed and differ in quantity and quality.This in consequence will result in models uncertainties, and hence, severely impact the network performance.DT in this context ensures that models are trained over highly-reliable data in terms of quality and quantity.Note that, in order to guarantee a general-enough and accurate models when machine learning (ML) algorithms are executed in a supervised fashion, data used for testing should be different than the data used for training, however, both should be drawn from similar distributions [12].This further corroborates the role of DT in empowering highly reliable and efficient ML algorithms, where not only large datasets can be generated, but their distributions can be controlled to ensure the required QoS.

C. Virtual implementation of AI: Unifying distributed and centralized approaches
Several research activities were initiated to explore the merit of AI when implemented in a distributed fashion.These activities were fueled by the increased communication overhead and latency, and compromised privacy, which are generally experienced in centralized methods.Although the research on distributed AI has picked up the pace in the recent years, it is still uncertain whether edge nodes are qualified for delivering the required QoS, particularly with the emergence of native AI concept, where each network node is anticipated to perform sensing, training, and inference at some level.In this regard, the DT offers a robust platform to calibrate the virtue of distributed and centralized algorithms, and to overcome their limitations.On the one hand, the overhead resulted from datasets exchange between the edge devices and the centralized aggregator will be alleviated from edge devices, users' privacy will be maintained, and high latency encountered in centralized algorithms will be reduced.On the other hand, low training accuracy in distributed schemes, due to the local datasets limitations, can be significantly improved through the implementation of the DT paradigm, where allinclusive datasets are available for enhanced models training.
Distributed models trained at the CT might require necessary updates once implemented at the physical environment, which can be done at the edge devices at the PT, and hence, the role of edge devices will be limited to updating the local models according to the new circumstances at the physical environment.Meanwhile, in the event of operational interactive twin, models updates can be performed at the DT as well, yielding a latency-accuracy-energy trade-off problem.

IV. MODEL-BASED DIGITAL TWIN
With the advent of AI and its promising advantages, it is now unusual to envision wireless networks without an AI element, where the employment of AI algorithms has become a trivial solution for any problem in wireless networks.While we fully agree that AI is an indispensable tool in future wireless generations, we cannot relegate the integral role of modeldriven approaches and the advantages that they can bring to the network design and optimization process.Although the common understanding of the DT paradigm is confined by creating a virtual replica of an existing physical domain, it is not a necessary for the PT to be available in order for the CT to be created [13].In the latter scenario, model-based approaches represent the key for the preliminary design of network assets and to support the decision-making process at the initial stages.The focus of this type of DTs is to mitigate technical risks, through exploring the network behaviour at the CT under what-if analysis mode.
On the other hand, mathematical models, not only facilitate the understanding of the logic behind AI algorithms, but also provides a resilient abstraction for the DT assets and dynamics.Therefore, it is insufficient to completely focus on AI-driven approaches to realize efficient DT network.Rather, efforts should be devoted to developing solid mathematical underpinnings in order to be able to identify the theoretical limitations of DT, put the foundations for a comprehensive mathematical interpretation of AI algorithms, and accordingly unleash the full potential of DT [14].Among several mathematical theories, we believe that the optimization theory, random matrix theory, graph theory, optimal transport theory, stochastic geometry, and game theory are essential tools that are required for an efficient construction and operation of digital twins.Such tools constitutes the base for i) modeling the randomness of physical environments and electromagnetic signals, taken into consideration the large amount of data to be sensed and communicated between the cyber and physical twins, ii) synchronizing and optimizing the operations among multiple cyber twins, particularly in a massive twinning scenario, to ensure a harmonized global twin, iii) balancing and coordinating the association and decoupling of multiple digital twins, and iv) robust 3D modeling of wireless networks, taking into consideration the spatial components.

A. The fusion of model-driven and data-driven approaches over the twin
The role of the mathematical frameworks in the design and optimization of wireless networks cannot be completely ignored, and a full reliance on ML tools will leave a noticeable gap.It is worthy to note that mathematical frameworks constitute solid pillars that pave the way for ML algorithms to achieve enhanced performance and scalability [15].Therefore, despite the advanced progression in AI models, it is an imperative fact that the absence of model-driven approaches in the design and optimization process represents a bottleneck in future wireless networks.Model-driven approaches faces two major limitations: i) In some complex network scenarios, e.g., ultra-high dense heterogeneous networks, the resulted expressions representing the system are mathematically intractable, and the associated optimization problems do not lend themselves into closed-form optimum or sub-optimum solutions, and ii) The reduced accuracy when the available mathematical framework cannot be readily scaled up to represent another network scenario.Accordingly, the DT offers a way out through integrating the benefits of model-driven and data-driven approaches in a unified process.In particular, the intractability issue of some model-driven approaches can be solved through employing artificial neural networks (ANNs), which is a sub-field of ML, in a way that allows ANNs to map the network parameters into the corresponding network performance, in a numerical fashion, until the convergence to the optimum solution.Such exhaustive approach is heavy and might not be tolerated by nodes with limited resources.Thanks for the DT paradigm, such a complexity can be alleviated from the physical nodes, yielding robust models training, that enjoys a solid mathematical foundation, with reduced overhead from the PT.Note that the role of DT in this scenario is to offer a platform for low-complex integration of theoretical models with ML frameworks.On the other hand, to tackle the inaccuracy limitations associated with network scalability in model-driven approaches, conventional approaches rely on initially training ML algorithms using the available mathematical models.Then, through synthetic data generation, trained models will be refined.In this scenario, the DT ensures the availability of sufficient, accurate, and closeto-real datasets for improved models refinement, and therefore, enhanced accuracy.

A. Exact digital replica
It is not hard to tell that the merit of the DT as enabler for AI is confined by the accuracy and the synchronization between the CT and PT.As discussed earlier, the benefits of the DT are revolving around acquiring an exact replica of the physical objects, and the instantaneous dynamics of the environment and network parameters, over a long period of time.This is due to the need of exploiting the DT as a digital environment for data generation and models training, and therefore, its accuracy is highly dependent on how accurate and realistic the CT compared to the PT.This opens the doors to further investigate the limitations of a holistic network virtualization, and to identify potential key solutions.

B. Fast AI algorithms
While the availability of large datasets at the CT for the purpose of AI models training is appealing, it is debatable whether current AI architectures will perform in a reliable and timely manner.Recalling that ultra-low latency is one of the DT verticals, it is essential to ensure that employed AI algorithms will be able deliver the required accuracy at a time-frame that is tolerable by the DT paradigm.The same applies to models updates at the CT.Hence, is there a need for designing new AI algorithms for the implementation of efficient DT?If not, to what extent current AI algorithms will be able to meet the latency requirements of the DT?This will further require the exploration of data management, cleaning, and rectifying methods, in order to enable efficient execution of AI algorithms.

C. Unlocking the theory behind AI algorithms
Although the benefits of AI algorithms are particularly pronounced in highly dynamic, large-scale twins, such algorithms are generally treated as black boxes, where the tangible insights are obtained from the input and output data, while the complex operations in between, which are generally characterized by their extremely high complexity, remain difficult to understand, modify, and improve.In this regard, the proper understanding of the hidden operations in deep neural networks (DNNs) paves the way for improved architectures that better serve the needs of DT.As a promising approach, random matrix theory (RMT) has manifested itself as a versatile, yet solid, approach for analyzing DNNs.Note that, coupled with the large non-identically distributed data generated from the CT, through randomly initializing the parameters of a neural network, complex DNNs can be represented as a large matrix of random variables.Accordingly, RMT can be leveraged to optimize the initial weights of DNNs, particularly when implemented on large-scale.In specific, RMT can be efficiently exploited to design novel activation functions, that will play a role in speeding up the training process, and therefore, realize low-latency DT [15].

D. Digital Twin-empowered emergent intelligence
The proliferation of machine-to-machine communications have stimulated the recent interest in contextual-based decision-making, i.e., the so-called emergent intelligence, where multiple agents collaborate to perform a particular task through intensively interacting with each other and with the environment.While wireless networks offer an environment of a large number of communicating agents, and hence, enable the agents to develop a more advanced language that is capable of conveying a variety of abstract ideas, such scalability feature will result in either an increased latency or reduced reliability due to the substantially increased communication traffic.Within this context, DT represents an optimum candidate to enable reliable agents interaction and to support the deployment of emergent communication in future wireless networks.In particular, digital replicas of the communicating agents can be implemented at the CT, yielding improved agents information exchange, reduced latency, enhanced links reliability, reduced signaling overhead, while ensuring accurate inference [1].As DT-empowered emergent intelligence is still in its infancy, it is essential to understand how DT can leverage the semantic aspect of communicated messages to develop a common level of understanding between communicating digital agents, that enables them to perform assigned tasks successfully, and to reflect this learning and inference process to the physical agents.

VI. CONCLUSION
Despite the trending concept that DTs will be completely controlled by AI, in this article we elucidated the interplay of DT and AI as being enablers and enabled by each other.Furthermore, we revealed the uncommon opinion that modeldriven approaches have an essential role in the realization of efficient and accurate DT.In particular, we offered a forwardlooking vision on how model-driven tools will complement data-driven approaches and assist with overcoming their limitations.We further emphasized on the indispensable role of the mathematical frameworks on understanding, improving, and designing DTs, as well as the importance of leveraging theoretical frameworks for enhanced AI architectures.It is worthy to highlight that, although several AI algorithms can be devoted to realize a high-quality DT, the full potential of DT can be achieved through the amalgamation of multiple AI algorithms over the twin, where further advantages, pertinent to latency, complexity, and reliability can be reaped by unifying the multiple algorithms in the DT.BIOGRAPHIES Lina Bariah (lina.bariah@ieee.org) is a Senior Researcher at the Technology Innovation Institute in Abu Dhabi.She is an IEEE Senior Member.She serves as an Associate Editor for the IEEE Communication Letters, and the IEEE Open Journal of the Communications Society.
Mérouane Debbah (Merouane.Debbah@tii.ae) is the Chief Researcher at the Technology Innovation Institute in Abu Dhabi.He is an IEEE Fellow, a WWRF Fellow, a Eurasip Fellow, an AAIA Fellow, an Institut Louis Bachelier Fellow, and a Membre émérite SEE.He has received more than 20 best paper awards.