Mobility Digital Twin with Connected Vehicles and Cloud Computing

—A Digital Twin is a digital replica of a living or non-living physical entity, and this emerging technology has attracted extensive attention from different industries during the past decade. Although a few Digital Twin studies have been conducted in the transportation domain very recently, there is no systematic research with a holistic framework connecting various mobility entities together. In this study, by leveraging both connected vehicle technology and cloud computing, an Mobility Digital Twin (MDT) framework is developed, which consists of three building blocks in the physical space (namely Human , Vehicle , and Trafﬁc ), and their associated Digital Twins in the digital space. The cloud architecture is built with Amazon Web Services (AWS) to accommodate the proposed MDT framework and to implement its digital functionalities of storage, modeling, learning, simulation, and prediction. The effectiveness of the MDT framework is shown through the case studies of three digital building blocks with their key microservices: the Human Digital Twin with user management and driver type classiﬁcation, the Vehicle Digital Twin with cloud-based Advanced Driver- Assistance Systems (ADAS), and the Trafﬁc Digital Twin with trafﬁc ﬂow monitoring and variable speed limit.


I. INTRODUCTION
T HE recent development of the Internet of Things (IoT) has been facilitating all kinds of cutting-edge technologies, where their application scenarios are rooted both in the user level (namely individual consumer or private company), and the system level (namely commercial or industrial sector). From the user's perspective, the introduction of the IoT will play a leading role in scenarios like assisted living, e-health, and enhanced learning. From the system's perspective, the most apparent scenarios will be industrial manufacturing, logistics, business/process management, and intelligent transportation of people and goods [1].
The Digital Twin, as an emerging representation of the IoT or Cyber-Physical Systems (CPS), has attracted increasing attention over the past decade [2]. It was ranked as one of the top 10 strategic technology trends for 2019 according to Gartner [3], among other technologies such as autonomous things (e.g., autonomous vehicles), immersive technologies (e.g., virtual reality and augmented reality), and quantum computing. Based on a market research report, the global Digital Twin market size was valued at USD 3.1 billion in Z. Wang, R. Gupta, K. Han, A. Ganlath, N. Ammar, H. Muralidharan, and P. Tiwari are with Toyota Motor North America R&D, InfoTech Labs, 465 Bernardo Avenue, Mountain View, CA 94043 (e-mail: ryanwang11@hotmail.com; rohit.gupta@toyota.com; kyungtae.han@toyota.com; akila.ganlath@toyota.com; nejib.ammar@toyota.com; prashant.tiwari@toyota.com). 2020, and is projected to reach USD 48.2 billion by 2026, with a compound annual growth rate of 58% during this forecast period [4]. It was also pointed out in their report that, automotive and transportation industry accounted for the largest market share in the Digital Twin market in 2019.
Although the definitions of the Digital Twin vary in different versions [5], [6], the basic concepts are essentially the same: A Digital Twin is a digital replica of a living or non-living physical entity. Digital Twin technology paves the way to realtime monitoring and synchronization of real-world activities with the virtual counterparts [7]. The Digital Twin concept was first born in the aerospace domain when the National Aeronautics and Space Administration (NASA) adopted that as a key element in its 2010 technology roadmap. Along with its rapid development in different domains during the past decade, including aeronautics and space [5], [8], robotics [9], [10], manufacturing [11], [12], and informatics [13], the Digital Twin also has a huge potential in the transportation domain.
The emergence of connected vehicle technology introduces another platform to implement the Digital Twin. Since the level of connectivity within our vehicles has greatly improved, these equipped vehicles are able to "talk" with other entities, such as with other connected vehicles through vehicle-tovehicle (V2V) communications, with traffic infrastructures through vehicle-to-infrastructure (V2I) communications, and with cloud servers through vehicle-to-cloud (V2C) communications [14]- [17]. Specifically, V2C communications allow connected vehicles to 1) upload their data to the cloud server, enabling Digital Twins to be built in the digital (cyber) world based on their counterparts in the physical world; and 2) offload their onboard computations to the cloud server, enabling Digital Twins to build models and calculate guidance information through powerful cloud computing, which can then be fed back to connected vehicles.
Very recently, a few Digital Twin studies have been conducted in the transportation domain [18]- [20], but none of them has a holistic framework connecting various mobility entities (i.e., human, vehicle, and traffic) together. In this study, an Mobility Digital Twin (MDT) framework is proposed with connected vehicle technology and cloud computing. The MDT framework is built on top of three different layers: 1) the physical space that has human beings, vehicles, and traffic infrastructures; 2) the digital space that has the digital replicas of aforementioned physical entities; and 3) the communication layer between these two spaces. Given the connectivity nature of this framework, it transforms connected vehicles into Internet of Vehicles (IoV) by leveraging IoT technologies. The cloud architecture is built with Amazon Web Services (AWS) to accommodate the proposed MDT framework and implement its digital functionalities of storage, modeling, learning, simulation, and prediction. The effectiveness of the MDT framework is shown through the case studies of three digital building blocks with their key microservices: the Human Digital Twin with user management and driver type classification, the Vehicle Digital Twin with cloud-based Advanced Driver-Assistance Systems (ADAS), and the Traffic Digital Twin with traffic flow monitoring and variable speed limit.
Since traditional mobility system frameworks heavily rely on onboard storage and computing, their functionalities are limited by multiple constraints, such as computing power, accessibility to big data, and easiness of deployments and modifications. On the contrary, the proposed MDT framework in this study addresses these constraints by making the following contributions: • Powerful: The MDT framework allows users to rapidly adjust cloud resources to meet fluctuating/unpredictable demands, providing high computing power at certain periods of peak demand. • Shareable: Bulk data generated by an end user is offloaded and stored on the cloud, which can be retrieved and utilized by the same user at a later time frame, or shared with other end users for microservices on demand. • Manageable: The MDT framework allows users to get their microservices up and running faster on the cloud platform, with improved manageability and less maintenance. Over-the-air (OTA) updates are also available to the MDT framework. • Extendable: Arbitrary mobility microservices can be easily implemented to the MDT framework with minimal change on the cloud architecture and data structure. The remainder of this study is organized as follows: Section II conducts a literature review regarding cloud computing and the Digital Twin in the context of connected vehicles. Section III introduces the framework of this MDT with a detailed explanation of three layers: communication layer (and data workflow), physical space, and digital space. Then, the cloud architecture based on AWS is developed in Section IV, which accommodates the proposed MDT framework. Finally, this study is finished with a brief conclusion in Section VI and a discussion about future challenges in Section VII.

A. Transportation Applications with Cloud Computing
The emergence of commercial cloud computing services, such as Amazon Web Services (AWS) [21], Microsoft Azure [22], Google Cloud Platform (GCP) [23], and Alibaba Cloud [24], has facilitated many applications in the domain of vehicular/transportation CPS. Such services always provide a variety of basic abstract technical infrastructure, and building blocks for distributed computing. Taking AWS as an example, which has the largest market share among all competitors in 2020, it comprises over 200 products and services for computing, storage, networking, database, analytics, IoT, and so on [21].
All these features of cloud computing services, together with their advantage of scalability, enable connected vehicles to offload their data and onboard computing demand to the cloud.
Guerrero et al. demonstrated that cloud computing can be integrated with intelligent transportation systems to address issues faced by the transportation sector, such as traffic congestion, roadway safety, and pollutant emissions [25]. Specifically, the concept of vehicular cloud can enhance transportation systems by storing and processing the collected data (including traffic lights, parking meters, camera images, etc.), and creating a historical registry of various data sources [26]. Therefore, the transportation authorities who own these entities can make informed decisions on when to change traffic directions, install new traffic lights, and remodel/repair road segments. However, the detailed cloud architecture design is not covered in these studies, and the vehicular cloud applications are introduced only on the conceptual level without conducting case studies.
During the past decade, various transportation applications have been proposed by leveraging the capability of cloud computing [27], [28]. A navigation-assisted route optimizer was developed by Gerla, where the navigator server collects information from connected vehicles, and then computes the optimal routes by constructing a traffic load map and traffic pattern matrix, estimating road segment loads and delays [29]. A bus smart sensor prototype was designed and implemented by Herrera-Quintero et al. using the serverless and microservice cloud architecture, where GCP Firebase was used for storage and AWS Lambda was used for computation [30]. A vehicular pollutant emission detection system was developed by Bhatnagar et al., where AWS IoT and Amazon DynamoDB were integrated to send notifications to the vehicle driver if the emission sensor detects a gas leakage [31]. A vehiclebased traffic surveillance application was developed by Deng et al., where the AWS-based serverless cloud architecture was proved to be feasible for real-time transportation applications through a field implementation [32]. However, aforementioned studies focus more on individual transportation application that provides solutions within a very limited domain, while none of them designs a holistic framework that connects various mobility entities, benefiting human, vehicle, and traffic at the same time.

B. Digital Twin Framework for Connected Vehicles
The Digital Twin concept has been loosely defined and adopted in the transportation domain since its emergence, partly due to its similarity and connection with other technologies. However, many previous efforts related to the IoT and CPS in the automotive industry envision the development of the Digital Twin, since the majority of those proposed methodologies and/or algorithms were developed on multilayer system frameworks with physical entities (i.e., vehicles) and their digital replicas (simulation models/environments).
Alam and Saddik developed a Digital Twin framework reference model for the cloud-based CPS, where a telematicsbased driving assistance application was proposed for the vehicular CPS consisting of three parts: 1) computation, 2) control, and 3) sensors and services fusion [18]. Kumar et  al. proposed a Digital Twin-centric approach with machine learning, edge computing, 5G communication, and data lake, aiming for driver intention prediction and traffic congestion avoidance [19]. Chen et al. proposed a "Digital Behavior Twin" framework in which behavioral models of drivers are shared among connected vehicles to predict future actions of neighboring vehicles and hence improve driving safety [20]. This idea was extended to two subsequent patent applications by the same authors [33], [34]. However, the term of "Digital Twin" is used more like a simple concept in these studies, where their presented technologies are agnostic to any IoT technologies besides the Digital Twin. They do not systematically integrate the Digital Twin in any cloud architecture design, nor do they have clear data structures or workflows.
Related literature in the fields of "parallel driving" or "parallel transportation" has direct implications to the Digital Twin framework of our study. In 2010, Wang brought up the parallel transportation concept for the first time, where he defined parallel control and management of transportation as "a datadriven approach for modeling, analysis, and decision-making that considers both the engineering and social complexity in its processes" [35]. Many subsequent works were conducted in this research domain afterward, including the parallel driving framework proposed by Wang et al. [36]. In this cloud-based cyber-physical-social system framework, the physical world, mental world, and artificial world are modeled as three parallel levels, considering interactions among connected vehicles, human drivers, and information. However, many applications shown in these studies do not come up with a cloud-based framework, and in such cases their capabilities of storage, learning, and prediction are well limited by vehicle onboard resources.
More relevant studies have been conducted very recently by the authors in the context of the Digital Twin for connected vehicles. Wang et al. proposed a Digital Twin paradigm for an advanced driver-assistance system (ADAS) of connected vehicles [37]. In this paradigm, onboard devices on connected vehicles collect and upload data to the cloud server through cellular-based V2C communication, where the cloud server can create digital replicas of entities in the real world (i.e., roads, vehicles, and drivers) based on the received data. All proposed models and algorithms are applied to these digital copies with cloud computing, where their results are propagated back to the real connected vehicles through V2C communication for ADAS, assisting the decision making of drivers in real time. A subsequent field implementation using this Digital Twin paradigm was conducted by Liao et al., where three human-driven passenger vehicles performed ramp merging cooperatively, showing the benefits of safety and environmental sustainability compared to the traditional ramp merging scenario [38]. Visualization of the Digital Twin information from the cloud remains a challenging issue, where Liu et al. developed a data-fusion methodology to overlay the Digital Twin information for the driver's field of view with the help of cameras (RGB and depth) images, assisting the driver to make lane change prediction of the neighboring vehicles [39]. However, none of these recent studies, which are from the authors of this study as well, designs a holistic system framework that connects mobility entities (i.e., human, vehicle, and traffic) together, and neither do they develop any cloud architecture with detailed data structure and workflow.
It needs to be noted that, many studies consider the Digital Twin simply as a high-fidelity modeling and simulation environment of real-world entities. Although this statement is partially correct, our understanding of the Digital Twin covers wider than merely modeling and simulation, namely sampling and actuation in the physical space, and storage, modeling, learning, simulation, and prediction in the digital space. Related literature with the limited definition of the Digital Twin is not reviewed in this study.

III. MOBILITY DIGITAL TWIN FRAMEWORK
The MDT framework proposed in this study, as shown in Fig. 1, consists of three layers: 1) The lower layer, highlighted in yellow, stands for the physical space where human beings, vehicles, and traffic infrastructures reside; 2) The upper layer, highlighted in blue, represents the digital space where the digital replicas of those physical entities are located at; 3) Between these two layers, the communication layer (in grey) plays a crucial role in this framework to allow real-time and non-realtime data streaming for both upstream and downstream.
Three entities are considered in this MDT framework: Human, Vehicle, and Traffic. Given the existence of the communication layer, each of the entity can be connected to the digital space (e.g., Internet) and exchanges data with each other. Therefore, this MDT framework is a good representation of the IoT, and it allows connected vehicles to act as IoVs with IoT technologies. In this section, we provide a deep dive into this MDT framework regarding all three aforementioned layers, introducing their building blocks with respect to their definitions and functionalities.

A. Communication Layer and Data Workflow
The communication layer of this MDT framework sits between the physical space and digital space, and it provides seamless connections between these two spaces. This MDT framework's end-to-end process starts from sampling data in the physical space, where all or part of the data is transmitted upstream to the digital space via the communication layer. Those data will go through one or multiple processes in the digital space internally, including storage, modeling, simulation, learning, and prediction, and the resulting data is transmitted downstream to the physical space via the communication layer. Those data, upon receiving, is applied by the actuators of the physical space to fulfill the end-to-end process.
Since cloud computing is leveraged in this MDT framework, the digital space of the framework is deployed fully or partially on the commercial and/or private cloud. Therefore, the communication module needs to provide access to the cloud for the physical space, which is either direct access or indirect access (via edges). The MDT framework does not necessarily require any specific wireless communication technology to be served as the communication layer, as long as it can be applied to transmit data between the physical space and the digital space.

B. Physical Space
If we consider this MDT framework as an end-to-end framework, then the physical space of this framework is in charge of both ends or this framework, namely, sampling and actuation. We assume no (or only minimal) computing work needs to be conducted in the physical space, since all (or majority) of that is offloaded to the digital space through communication.
For sampling, sensors in the physical space detect the dynamic status, operating process, or event occurrences, and then aggregate these measurements under various resolutions for their transmission to the digital space. On the other hand, once the processed results are received from the digital space, actuation can be made by physical entities to fulfill this end-toend framework. Generally, the physical space is defined on a world coordinate, which may contain all the transportationrelated physical entities, and can be classified into three building blocks: Human, Vehicle, and Traffic. 1) Human: In this framework, all human beings involved in the transportation system are considered, which include not only drivers but also passengers, pedestrians, cyclists, etc. The sampling process can be accomplished by the human-machine interface as an active manner, or by the in-cabin status sensing (e.g., camera, seat sensor, etc.), human wellness monitor (e.g., smartwatch, electrocardiogram, etc.), and other perception sensors as a passive manner. The preferences of a human's behavior can also be set actively (e.g., a driver manually sets the preferred cruise control speed), or be measured passively (e.g., a pedestrian's preferred trajectory of crossing a crosswalk is recorded by the vehicle/intersection camera), where both of them are considered as the sampling process.
The actuation process of the Human block in this MDT framework is mainly conducted by drivers. In the foreseeable future, our transportation system will remain in a mixed autonomy traffic environment, where only part of all vehicles will be fully autonomous vehicles (with SAE level-5 automation), but the majority are still driven by human drivers (with no degree or a certain degree of automation). Therefore, if drivers can be provided with additional information from the digital space of this MDT framework, such as adjacent vehicle's lanechange possibility or upcoming signal timing, their actuation will be more accurate and in turn benefit other entities in the transportation system.
2) Vehicle: Vehicle is the core of this MDT framework, as it is the host of drivers and passengers, and also the fundamental component of traffic. As can be seen from Fig. 1, all modules in the physical space, not only the ones in the Vehicle block itself but also those in the Human block and the Traffic block, are serving for vehicle-related activities.
Specifically for the Vehicle block, the localization module (i.e., GNSS), the perception sensors (i.e., ultrasonic, camera, radar, and/or LiDAR), together with vehicle CAN BUS are in charge of the sampling process. Related data, such as positions, speeds, and accelerations of the ego vehicle and its surrounding vehicles can be sampled from these physical components, and then be propagated to the digital space through communication.
The actuation process of the Vehicle block in this MDT framework is conducted by the vehicle steering system, accelerator, and brake. These physical components are able to actuate any lateral or longitudinal control command received from the digital space, and therefore allow the vehicle to achieve its desired motion.
3) Traffic: Many existing intelligent vehicle platforms and applications, such as ADAS or autonomous driving systems (ADS), only focus on their performances on the ego vehicle without considering their interactions with the large-scale traffic network. However, as can be seen from Fig. 1, Traffic is indeed a crucial building block of our MDT framework for connected vehicles. The beneficiaries of this MDT framework include not only connected vehicles and their occupants, but also the whole traffic network on a wider scope.
Particularly, the Traffic block in the physical space includes various traffic infrastructures, such as traffic signals, roadside units, camera/radar/loop detectors, and electronic traffic signs. These physical components are able to either generate data (e.g., signal phase and timing) by themselves, or measure data (e.g., traffic count and traffic flow) generated by other traffic entities. Such data is sampled and sent to the digital space through communication, benefiting other building blocks of this MDT framework.
On the other hand, guidance or adjustment received from the digital space can also be actuated by the Traffic block to improve the safety and efficiency of the large-scale traffic network. For example, the signal phase and timing of traffic lights can be adjusted to better serve different traffic flows under different situations. Guidance or warning information can be broadcast to connected vehicles via roadside units, and to all traffic entities via electronic traffic signs.

C. Digital Space
The aforementioned physical space of this MDT framework handles both ends of this end-to-end framework (i.e., sampling and actuation). On the other hand, the digital space is in charge of the processes between both ends: storage, modeling, learning, simulation, and prediction.
One of the biggest strengths of this MDT framework over traditional mobility system frameworks is the data lake, which is a centralized repository that allows structured or unstructured data at any scale to be stored. Traditionally, mobility data measured by a physical entity is only saved in its onboard data storage due to the lack of communication capability. Such data is only used for the physical entity itself without being shared with other entities, and will be wiped out once the maximum size limit of the onboard data storage is met. However, with the proposed MDT framework, mobility data measured by the Human, Vehicle, and Traffic blocks in the physical space can be transmitted to the digital space through the communication layer, and stored in the data lakes of associated Digital Twins for future use. Such data can be used for the microservices not only in the original mobility block, but also in other blocks (e.g., traffic signal data measured by the Traffic block can be used for both the "real-time monitoring" microservice in the Traffic Digital Twin and the "cooperative control" microservice in the Vehicle Digital Twin).
Note there exists a misunderstanding about the Digital Twin in the research community, where some simply consider Digital Twin technology as a modeling and simulation technology. In our MDT framework, modeling and simulation are part of the digital processes that are enhanced by the data lake and data sharing in the digital space, where co-simulation platforms can be built to synchronize data from multiple simulators (such as the Unity-SUMO integrated platform [40]). However, as shown in Fig. 1, our MDT framework is more than just modeling and simulation, where other digital processes (i.e., storage, learning, and prediction) play equivalent roles in the digital space. All of aforementioned digital processes can be applied to mobility microservices, and they are realized in a more powerful, shareable, manageable, and extendable manner by leveraging the cloud architecture and cloud computing.
1) Human Digital Twin: Human Digital Twins are digital replicas of real humans in the physical space. This building block in the digital space has a human data lake that stores all data sampled from the Human block in the physical space, where different humans have their personal databases to be differentiated from others. With real-time data sampling and historical data storage, the Human Digital Twin is able to classify drivers into specific driver types by machine learning algorithms like k-nearest neighbors (KNN), and to provide guidance in a customized or personalized manner [41]. Taking advantage of the data coming from the Vehicle block, the Human Digital Twin can also predict future behaviors of drivers (e.g., lane-change intention [42]) and detect their anomalies [43]. The results of the aforementioned microservices can be applied to third parties such as insurance companies, where they can further build a microservice to set the insurance pricing for different drivers based on their driving behaviors [44].
2) Vehicle Digital Twin: Vehicle Digital Twins are the digital replicas of real vehicles in the physical space. Once the sampled data is received from a connected vehicle in the physical space, it can be saved in this particular vehicle's data lake with a unique identification number. Those data in the Vehicle Digital Twin about the ego vehicle (e.g., position, speed, and acceleration) and its surrounding environment (perceived by perception sensors) can also be shared with the Human Digital Twin, the Traffic Digital Twin, or other connected vehicles' Vehicle Digital Twins for various microservices.
With massive data storage and data sharing in the digital space, multiple vehicle-related microservices can be enabled, such as the ones requiring cooperation among multiple connected vehicles: cooperative localization [45], cooperative perception [46], cooperative planning [47], and cooperative control [14]. Additionally, microservices that need time-series data can also be benefited from this MDT framework, where one typical example is predictive maintenance: Based on modeling and simulation of the time-series vehicle data that is sampled from the Vehicle block in the physical space and stored in the Vehicle Digital Twin, the learning process can be conducted in the digital space and predictions can be made regarding potential failures of vehicle components at a future time [48]. Such prediction results can be used by the vehicle owner or manufacture to schedule onsite maintenance before the components break down.
3) Traffic Digital Twin: Traffic Digital Twins are the digital replicas of traffic infrastructures, which receive data from the Traffic block in the physical space. Such sampled data, like signal phase and timing, traffic count, and traffic flow, can be stored in the traffic data lake for future reference. It can also be used for multiple traffic microservices in real time, such as monitoring the traffic condition [19], variable speed limit [49], routing and navigation [50], ridesharing planning [51], and parking management [52].
Similar to the Human Digital Twin and Vehicle Digital Twin, the Traffic Digital Twin can be enhanced by the communication among these Digital Twin blocks. For example, the microservice of routing and navigation can be carried out solely by the real-time traffic flow data sampled from camera/radar/loop detectors in the real world. However, they can be further enhanced if behavior preferences are set by the Human block and predictions are made by the Human Digital Twin (e.g., a driver/passenger always goes to grocery stores when his/her commute route is highly congested). Additionally, if the Vehicle block detects the fuel/battery level is low and sends that to the Vehicle Digital Twin, it can also assist the routing and navigation microservice to find a gas/charging station near a user-preferred grocery store along the original route.

IV. CLOUD ARCHITECTURE WITH AWS
In this section, we build the cloud architecture with AWS that can accommodate our proposed MDT framework for con-nected vehicles. Our purpose is to build a data-driven platform for both real-time and bulk-batch ingestion, processing, and analytics. As shown in Fig. 2 [56] to provide real-time processing and analytics. The Analytics Workbench is the workhorse for big data analytics. It consists of OpenTSDB, which is a distributed, scalable, time-series database built on top of Hadoop and HBase [57]. This supports a writing rate of up to millions of entries per second, supports data storage with millisecond-level precision, and preserves data permanently without sacrificing precision. In addition, Apache Spark [58], a distributed processing system, is used to conduct predictive analytics using Amazon EMR clusters [59].
Rule Engine service evaluates the rules configured for entities (e.g., humans and vehicles) on the data received from the Kafka queue, and redirects it to AI/ML Framework & Digital Twin Microservices based on the rule validation result. AI/ML Framework & Digital Twin Microservices are the core of this cloud architecture, where end users are able to implement customized algorithms and applications with various objectives. This module processes time-series data sent from the physical space using statistical techniques, and sends guidance back to the entities in the physical space. The data workflow is triggered via Apache Airflow, an open-source workflow management platform.
Data Stores of our cloud architecture are made up with: 1) Amazon S3, a scalable storage infrastructure to build our Digital Twin data lake [60]; 2) Amazon DocumentDB (with MongoDB compatibility), a database service that is purposebuilt for JSON data to execute flexible, low latency queries to obtain a near real-time record of events in parallel on a massive scale [61]; and 3) Redis, an open-source, highly replicated, non-relational kind of database and caching server [62].
Outside of Amazon VPC but inside of AWS sits AWS IoT Core [63], which enables the connection between IoT devices (such as mobile apps, simulators, real vehicles, and RC vehicles in this study) and AWS cloud without the need to provision or manage servers. It supports various devices and messages, and can process/route those messages to AWS endpoints/devices reliably and securely. A Bulk Data Ingestion module is also developed in this cloud architecture, enabling the ingestion of terabytes of data in batch mode into our data lake. Some scenarios where this module can be triggered are 1) end of vehicle trip bulk data ingestion; 2) periodic bulk data ingestion; 3) event-triggered data ingestion; 4) in-vehicle data logging.
OpenID Connect is a simple identity layer on top of the OAuth 2.0 authorization protocol, which is adopted in this cloud architecture to verify the identity of end users based on the authentication performed by an authorization server, as well as to obtain the basic profile information about end users [64]. Amazon API (namely application programming interface) Gateway, as an AWS managed service, is adopted to create, publish, maintain, monitor, and secure APIs at any scale [65]. The API Gateway acts as the "front door" for applications to access data or functionalities from our back-end services.
Outside of AWS, external data sources can be leveraged to enrich the functionalities of cloud microservices. For example, traffic data (TomTom [66]), map data (OpenStreetMap [67]), and weather data (OpenWeather [68]) are integrated into AWS API Gateway via HTTP. With such data, more microservices can be deployed in the Traffic Digital Twin, and hence provide better guidance towards humans and vehicles in the physical space. Additionally, a web portal is designed to visualize the digital processes on the cloud, and enable end users to create and modify microservices.
The most important data sources are shown on the right side of Fig. 2, which stand for the Human, Vehicle and Traffic building blocks in the physical space of the MDT framework. Mobile apps are designed for both Android and iOS, where end users' position and speed data (measured by GPS and gyroscope) can be uploaded to AWS IoT Core via MQTT. Additionally, a customized edge gateway is developed to allow external simulators (such as SUMO [69] and Unity [70]), and real vehicles and RC vehicles (with ROS2 embedded) to transmit messages with AWS IoT Core via MQTT. To the current stage of our study, these are the ways that end users can generate data in the physical space, and they will be further introduced in the next section.

A. Human Digital Twin: User Management and Driver Type Classification
In this subsection, we first conduct a case study on the Human Digital Twin. A web portal is built to enable various data management and visualization functionalities, where related data sampled in the physical space is transmitted and visualized on the web portal through HTTPS, as illustrated in Fig. 2.
To begin with, each human user needs to register a unique account of the MDT through the web portal , and also to specify the group(s) he/she belongs to: "viewer", "supervisor", and/or "admin". Particularly, the user account that belongs to the "admin" group is granted full access to the human data lake, which can register/delete any account in the digital space, alter the group(s) any account belongs to, and subscribe/unsubscribe microservices for any account. "Supervisor" account can only access its own human data lake with modification right, while "viewer" account has no modification right at all.
Human-vehicle association is also available through the web portal, which enables the human user to associate his/her vehicle in the vehicle data lake, building the connection between the Human Digital Twin and the Vehicle Digital Twin. Therefore, once the user account is registered, all the data generated by this human user in the physical space will be stored both in the user data lake (under the particular user ID) and in the vehicle data lake (under the particular vehicle ID).
Besides the aforementioned details regarding user management, the built-in microservices of the Human Digital Twin can also be applied. One example is shown as Fig. 3, where different driving scores of a driver (i.e., overall score, eco score, safe score, and comfort score) are calculated by the open-source MOVESTAR model [71] running on the cloud in real time. These driving scores can be further compared with the historical data, which is generated by other drivers and stored in the human data lake to classify this driver into a certain type. An example classification result is visualized on the web portal (currently shown as "Competent" in Fig. 4), which can be used for other microservices such as behavior prediction, personalized guidance, and insurance pricing. The historical classification results of this driver can also be retrieved by clicking the detailed trip list shown in Fig. 4, which further validates the power of the Human Digital Twin in terms of storage.

B. Vehicle Digital Twin: Cloud-Based Advanced Driver-Assistance Systems
In this subsection, we conduct a case study on the Vehicle Digital Twin through a cloud-based ADAS, where cloud computing is leveraged to provide personalized guidance and control commands towards connected vehicles. As shown in Fig. 5, a human-in-the-loop simulation (built with AWS, the Unity game engine, and the Logitech G29 Driving Force) is conducted to simulate the data sampling process from a connected vehicle in the physical space [72]: When a human driver manually controls the vehicle, its data is sampled from the physical components (e.g., CAN BUS, radar, camera) and uploaded to the data lake of the Vehicle Digital Twin. This process is shown at the lower-left corner "Unity-AWS Uplink Message" of Fig. 5.
The data stored in the data lake (potentially from all historical trips) of this vehicle is inputted to microservices of the Vehicle Digital Twin (e.g., cooperative planning, cooperative control, etc.). Machine learning-based algorithms are implemented to learn the performance and preference of each vehicle and/or driver, where the algorithm outputs may include prediction/guidance of their current status and future behaviors.
Such algorithm outputs are downloaded from the Vehicle Digital Twin to the vehicle, as shown at the lower-right corner of Fig. 5 "AWS-Unity Downlink Message". By leveraging computer vision technologies, this information is overlaid on top of each vehicle through an augmented reality (AR) head-up display (HUD) design, assisting the decision making of other drivers [73]. As shown in Fig. 6, from this driver's field of view, the following information of the surrounding vehicles

. Prediction and guidance information received from the Vehicle Digital
Twin on the cloud is visualized through an augmented reality (AR) head-up display (HUD), which may include: driving proficiency score and its trend, potential action (e.g., hard braking or lane change) and its possibility, as well driving mood score. and their drivers can be known (from top to bottom of the overlaid information): driving proficiency score and its trend, potential action (e.g., hard braking or lane change) and its possibility, as well as driving mood score.
Compared to a traditional ADAS that relies on pure onboard sensing and processing of the ego vehicle, the key advantages that MDT brings to this cloud-based ADAS are: • Heavy computations, such as training a machine learning algorithm based on the sampled data, can be offloaded to the cloud to utilize more computing power and hence save time. • Additional data sources in the physical space, such as surrounding vehicles and downstream traffic, can be utilized to enhance the functionalities of ADAS. • The cloud-based ADAS can be easily migrated through the cloud and hence increase accessibility, which can be available on various vehicles for the same driver, or for various drivers on the same vehicle. • Updates of ADAS algorithms and applications can be conducted much quicker and easier through OTA updates.

C. Traffic Digital Twin: Traffic Flow Monitoring and Variable Speed Limit
This subsection conducts a case study on the Traffic Digital Twin with the microservices of traffic flow monitoring and variable speed limit. The mobile app we developed for MDT, together with the microscopic traffic simulator SUMO, is adopted in this case study to generate traffic flow and hence represent connected vehicles traveling in the physical world. As shown in Fig. 7(a), we develop an iOS mobile app (as well as an Android mobile app that is not shown here) for MDT that allows end users to upload their data from the physical space to the digital space, which may include latitude, longitude, and speed. At the initialization step of the app, the user is asked to associate with a vehicle in the digital space, and also set the data push rate. Once "START" button is pressed, the app will get the data from the mobile phone itself, and push that to the Vehicle Digital Twin through AWS IoT Core. Similarly, when SUMO simulation gets started, each vehicle in the simulation publishes its latitude, longitude, and speed to the Vehicle Digital Twin through the edge gateway.
When the traffic flow monitoring microservice is turned on, it gets the data from all Vehicle Digital Twins and aggregates it in the Traffic Digital Twin. A demonstration of this microservice running in motion is shown in the snapshot Fig. 7(b). The count of all vehicles running on the specific link is calculated, and then gets further divided by the link length and number of lanes to get the traffic density value. This value is compared with the pre-defined value ranges for heavy congestion (red), moderate congestion (orange), or no congestion (green) to determine the traffic density on that link and gets represented in the corresponding color on the map. Additionally, the different colors of vehicles visualized on the map indicate their current speeds, where the ones traveling equal or below the average link speed are shown in blue color, and the ones above the average link speed are in white.
In order to better manage traffic flows in the physical space, the variable speed limit can be applied to connected vehicles through the Traffic Digital Twin. As can be seen from Fig.  7(c), users can easily draw a geo-fence by specifying its name, radius, latitude, and longitude of the center location. A hexagon will then be visualized on the map, and users are prompted to choose the rules that are applied to vehicles, together with their states ("entry" or "exit") and associated  values. Taking the variable speed limit microservice as an example, users can set the rule state "entry" with a value "40" and another rule state "exit" with a value of "60". This setting indicates that all connected vehicles which subscribe to this microservice will receive a speed recommendation/limit of 40 mph when entering this geo-fence, and 60 mph while exiting.

D. End-to-End Latency Testing with Human/Vehicle/Traffic Digital Twins
In this subsection, we conduct an end-to-end testing of the proposed MDT framework, where the Human Digital Twin, Vehicle Digital Twin, and Traffic Digital Twin all participate in this process. The purpose of this test is to gauge whether the proposed MDT framework can be integrated into massproduced vehicles for real-world mobility applications. It should be noted that, this testing does not focus on safetycritical or time-critical application, but acts more like a stress testing to verify the stability and reliability of the framework while executing complex computing processes on the cloud.
This end-to-end testing utilizes the CAN BUS data generated from a Lexus LS test vehicle shown as Fig. 8(a), which includes 15 data fields such as position, speed, acceleration, and so on. Time-series data of a 10-minute trip (i.e., Fig.  8(b)) gets sent to AWS through the aforementioned Bulk Data Ingestion module after the trip is finished. On the cloud, multiple processes are conducted for the uplink time-series data, including: 1) Store all the raw uplink data into OpenTSDB.
2) Invoke statistics service, driver classification service, and weather classification service (with OpenWeather API) for the raw uplink data and store the classification result in MongoDB. 3) Filter the raw uplink data to extract the necessary portions, and invoke machine learning algorithms to calculate the outputs. 4) Query the MongoDB regarding all historical trips under the same classification result and fetch their learning outputs. 5) Compute the downlink output by integrating the learning outputs of this immediate past trip and all historical trips that are under the same classification, and also save the output in MongoDB. As can be seen from the results in TABLE I, under different sampling frequency of the CAN BUS data, the uplink latency is relatively proportional to the uplink batch size, with a minimum of 1.4 s for 12 MB and a maximum of 21.1 s for 240 MB. However, the majority of the end-to-end latency is made up by the aforementioned cloud computing processes, which roughly takes 16 s under different data frequency. In this end-to-end testing, since the downlink data is pre-defined to follow the same size and format, the major factor of the latency is the number of cloud computing steps instead of the uplink data size.
In a nutshell, the end-to-end testing showcases the capability of the proposed MDT framework to execute complex cloud computing processes based on bulk uplink data within a rational range of latency. In real-time mobility applications like the ones shown in the Human Digital Twin as Fig. 3, and in the Vehicle Digital Twin as Fig. 5 and Fig. 6, where only a limited number of cloud computing processes are executed based on the uplink data stream, the MDT framework guarantees 80 ms as the medium end-to-end latency to enable safety-critical and time-critical applications.

VI. CONCLUSION
In this study, an MDT framework has been developed with connected vehicles and cloud computing. It has been found from the literature review that, although the Digital Twin concept has been recently studied in the transportation domain, there is no related literature that leverages this technology together with cloud computing to benefit connected vehicles with detailed microservices. In this study, the proposed MDT framework has been demonstrated with details regarding all of its components: Human, Vehicle, Traffic, together with their associated Human Digital Twin, Vehicle Digital Twin, and Traffic Digital Twin.
The public cloud service AWS has been adopted in this study to design the cloud architecture, which accommodates the MDT framework and makes it a reality rather than a concept. To showcase the effectiveness of the MDT framework, case studies have been conducted from all three digital building blocks with their key microservices: the Human Digital Twin with user management and driver type classification, the Vehicle Digital Twin with cloud-based ADAS, and the Traffic Digital Twin with traffic flow monitoring and variable speed limit.

VII. FUTURE CHALLENGES
In the future development and utilization of Digital Twin technology in both academia and industry, together with the involvements of connected vehicle technology and cloud computing, numerous challenges need to be addressed from the perspectives of both research and engineering. Some of the major challenges are discussed in this section with open questions.

A. Digital Twin Standardization?
Although Digital Twin technology has gained momentum in various domains during the past decade, there is no universal definition of this technology, let alone existing standardization. Currently, the joint technical committee "Internet of Things and Digital Twin" of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), is still developing the standards for the Digital Twin in terms of concepts and terminology [74], as well as use cases [75]. Additionally, the specific standards for the Digital Twin manufacturing framework are also under development by the ISO technical committee "Industrial Data", with focuses on overview and general principles [76], reference architecture [77], digital representation of manufacturing elements [78], and information exchange [79].
Similar to manufacturing, specific standards need to be developed for the transportation domain, so Digital Twin technology can be fully deployed on the connected vehicles we will be riding in the future. Such standards can be used by different organizations to define APIs for Digital Twin data access, enabling different transportation entities (e.g, human drivers, vehicles, and traffic infrastructures) to securely and reliably store, manage, and retrieve records. Related standards can also help developers design Human Machine Interfaces (HMI) to enable better interactions between physical and digital spaces of the Digital Twin.
However, the standardization of the Digital Twin in the transportation domain can face numerous challenges, since the consensus may be difficult to reach across the public sector (e.g., transportation agencies) and the private sector (e.g., automotive manufacturers, suppliers, and network providers), similar to the everlasting debate between Dedicated Short-Range Communication (DSRC) and Cellular Vehicle-to-Everything (C-V2X) communication for connected vehicles.

B. Public Cloud or Private Cloud?
In this study, a public cloud architecture has been designed and deployed on AWS. However, this does not necessarily mean our proposed MDT framework can only work on AWS instead of other cloud platforms. Other commercial platforms like Microsoft Azure (especially with its "Azure Digital Twins" [80]), GCP and Alibaba Cloud could also be the alternatives to accommodate the MDT framework.
However, a cloud platform must be trustworthy to deploy any of the microservices mentioned in this study (especially for the ones related to the Human Digital Twin), as end user information and related data need to be secured from being compromised. The nature of public cloud platforms inevitably gives away the control of resources to some extent, which may introduce cybersecurity and privacy risks for end users.
A private cloud platform, on the other hand, consists of cloud computing resources used exclusively by one business or organization. Since the services and infrastructure are always maintained on a private network without sharing with others, a private cloud platform can address the security and privacy issues faced by a public cloud platform. Private cloud platforms also make it easier for end users to customize cloud resources to meet specific requirements and implement specific functions, which was proved in our private cloudbased cooperative ramp merging experiment [38].
Although private cloud platforms are more secured and flexible, it has several disadvantages compared to public cloud platforms. In general, private cloud platforms are more expensive, since hardware and software should be dedicated solely to particular organizations that they serve (and hence paid solely by those organizations). In terms of scalability and reliability, private cloud platforms are also outperformed by public cloud platforms, because the public ones provide ondemand resources to meet various organizations' needs, and also provide a vast network of servers to ensure against failure.
Therefore, based on specific requirements and needs of end users, choices can be made between public and private cloud platforms, considering their advantages and disadvantages described above. Additionally, building Digital Twins with a hybrid cloud approach (combining public and private cloud platforms, potentially with edge/fog computing) provides another possibility, which also leads to the discussion in the next subsection.

C. Fully Cloud-Based Approach or Hybrid Approach with Edge Computing?
A fully cloud-based architecture has been designed in this study, where only sampling and actuation processes are conducted in the physical space, but all computation-related processes are conducted on the cloud (i.e., AWS in this case). However, this does not necessarily indicate all digital processes of the proposed MDT framework (shown in Fig. 1) must sit on the cloud.
In fact, it gets difficult sometimes for connected vehicles to have the cloud access, since they continuously move around and may lose internet connection every now on then. Therefore, a hybrid approach with edge/fog computing and cloud computing can meet the requirements of ultra-low latency for running safety-critical microservices at the edge (e.g., road-side units), and of extensive resources for running datadriven microservices on the cloud [28]. Edge computing has already been widely researched by various works in the field of connected vehicles [81], [82], and also deployed in the real world in projects like 5G-MOBIX [83] and initiatives like Automotive Edge Computing Consortium (AECC) [84].
Additionally from the connected vehicle perspective: Instead of dumping all raw data sampled from its physical sensors to the cloud or edge, some onboard post-processing can be done to reduce the data size being uploaded (e.g., retrieving the necessary parameters from raw CAN BUS data), and hence reduce data transmission time and cloud storage cost; Instead of assigning all computing tasks to the cloud or edge, part of the modeling and prediction processes can be done on the vehicle onboard computer (e.g., predicting a target vehicle's behaviors based on a neural network trained on the cloud), so real-time actuation can be guaranteed with minimum delay. Data and models can be decoupled for transmission between the two spaces of the Digital Twin, where the transmission of data needs more consideration than that of models, since data can be significantly larger in size than model parameters.
To sum up, scalability, reliability, security, cost, and latency are the key factors to consider when decisions are made between the private cloud and the public cloud, and between the fully cloud-based approach and the hybrid approach for Digital Twin frameworks. Along with the rapid development of the Digital Twin, it can be envisioned that more related studies will be conducted in the transportation domain to tackle these challenges in the near future.
Prashant Tiwari received the Ph.D. degree in mechanical engineering from Rensselaer Polytechnic Institute in 2004, and the MBA degree from the University of Chicago in 2016. He is currently an Executive Director at Toyota Motor North America, InfoTech Labs. Dr. Tiwari is highly active in Automotive Edge Computing Consortium (AECC) and SAE. Prior to joining Toyota, Dr. Tiwari held several leadership positions of increasing responsibilities at GE and UTC Aerospace Systems.