Cooperative Platoon Formation of Connected and Autonomous Vehicles: Toward Efficient Merging Coordination at Unsignalized Intersections

This paper presents a Vehicle-Platoon-Aware Bi-Level Optimization Algorithm for Autonomous Intersection Management (VPA-AIM) to coordinate the merging of Connected and Automated Vehicles at unsignalized intersections. The constraint-coupled bi-level optimization is operated within a rolling horizon to balance traffic performance and computational efficiency. In each decision step, the platoon formation scheme is incorporated into an upper-level traffic scheduling model as decision variables to pursue an optimal schedule from a systemic view. Meanwhile, the passing sequence and timeslots of vehicles are jointly optimized with the platoon configuration scheme by virtue of real-time traffic states to improve operational efficiency and fairness. After that, a lower-level trajectory planning model will generate dynamically-feasible and energy-efficient trajectories according to the given schedule and coupling constraints with the objective of improving space utilization to prevent spillbacks. Moreover, the quantifiable connection between the makespan of traffic scheduling schemes and the occurrence of spillbacks is established, demonstrating that the cooperative platoon formation strategy is effective in avoiding and mitigating spillbacks in normal and saturated traffic states. Additionally, the proposed algorithm can be extended to mixed traffic scenarios. Numerical experiments are conducted on extensive scenarios with different arrival flows, where the Constraint Programming technique is employed to produce the optimal schedule. Experimental results indicate the superiority of the proposed approach in optimality and stability with reasonable sub-second computation time for real-life applications.

bottlenecks in urban road networks. However, the present signalized intersection is far from perfect for the burgeoning number of vehicles [1]. It is observed that the switching of traffic signals constantly interrupts the traffic flow and gives rise to a frequently observed phenomenon called stop-andgo traffic [2]. Consequently, drivers may be tired, agitated, and even angry, leading to undesirable driving behaviors called road rage [3], which may further aggravate congestion. In addition, the signal-switching process is not instantaneous but requires a setup phase (i.e., the yellow light) lasting about five seconds [4], which reduces intersection throughput considerably. Furthermore, when adverse weather occurs (e.g., heavy rain, fog, snowstorms, and dust storms), it can be challenging for drivers or onboard infrared cameras to capture the signal light accurately [5].
Fortunately, Connected and Automated Vehicles (CAVs) that combine wireless communication and autonomous driving hold the potential to revolutionize intelligent transportation systems [6], [7], [8], offering advantages for accident avoidance [9], mobility improvement [10], and emission reduction [11]. The benefits of CAVs on intersection control are mainly two-fold. On the one hand, CAVs enable the joint optimization between passing sequences and motion trajectories at autonomous traffic intersections [12], [13], [14], [15]. In other words, vehicles will run with the pre-determined trajectories within assigned timeslots to improve traffic efficiency by avoiding unnecessary speed up/down operations and stops. Therefore, it is expected that physical traffic signals will no longer be necessary for future traffic management [16], since the Vehicle-to-Everything (V2X) technique enables virtual traffic lights [17] that reside at each vehicle to reduce the difficulty of signal detection. On the other hand, CAVs enable the application of vehicle platooning [18], [19], [20], [21], [22], which can further increase road storage by reducing inter-vehicular headway considerably [23], [24], [25], [26] and improve energy efficiency by mitigating aerodynamic drag and unnecessary speed fluctuations [27]. Hence, researchers try to facilitate the advantages of vehicle platooning to enhance traffic mobility at autonomous intersections, which raises a critical research question of how to determine the optimal platoon formation scheme for vehicles to optimize important objective indicators subject to relevant constraints.
Despite extensive existing works on jointly optimizing vehicle trajectories and passing sequence [12], [13], [14], [15], the problem of platoon configuration has not been adequately addressed due to the complexity of the requirement for a global decision-maker. For example, most of the existing research assumes that platoons have been formed before vehicles enter the control zone [14], [28], [29], whereas other works group vehicles into platoons by virtue of empirical rules from the individual point of view [13], [30], [31], without cooperation with vehicles on other lanes. Hence, this paper aims to address these gaps in cooperative platoon formation to explore the potential of platoon-aware traffic management.
Compared to existing studies, the contributions of this paper are threefold. First, this work incorporates the platoon formation scheme as decision variables into the joint optimization of vehicle trajectories and intersection controllers, intending to pursue the global-optimal schedule, significantly improving operational efficiency and fairness compared with the state-of-the-art algorithms. In other words, the decision to merge vehicles into platoons is based on real-time traffic states to achieve one-dimensional cooperation on the same lane and two-dimensional cooperation with vehicles on another lane. Second, this work presents an effective traffic coordination algorithm to avoid/mitigate spillbacks in normal/saturated traffic states, where the quantifiable connection between the makespan of traffic scheduling schemes and the occurrence of spillbacks is established. Additionally, the proposed algorithm can be extended to mixed traffic scenarios with various penetration rates of CAVs based on the assumptions that Human Driven Vehicles (HDVs) have bounded tracking errors and CAVs serve as platoon leaders. Third, it is able to produce provable optimal traffic scheduling schemes with high computational efficiency and stability for real-life applications using the Constraint Programming (CP) technique, whose computation time remains at a sub-second level even for high-flow traffic states in most scenarios tested.
The remainder of this paper is organized as follows: Section II discusses related works on Autonomous Intersection Management (AIM). Section III presents definitions of the cooperative merging problem, followed by a detailed description of the proposed bi-level optimization method in Section IV. Numerical simulation experiments are conducted in Section V along with performance comparison with other existing methods. Section VI concludes this paper and discusses some open issues and future work.

II. RELATED WORK
Since Dresner and Stone presented the concept of AIM [32], numerous strategies have been proposed to improve traffic efficiency at autonomous intersections without signal controllers. It is clear that the overall performance of the system is susceptible to the underlying control strategy [33], [34], which can be categorized into three groups, i.e., the Priority-based AIM (PR-AIM), Optimization-based AIM (OP-AIM), and Heuristic-based AIM (HR-AIM). The First-In-First-Out (FIFO) policy [35] is a classical priority-based method, while the FIFO-based reservation system has been proved to outperform the signal controller in terms of delay under certain circumstances [36], [37]. In that case, the priority of n vehicles will be determined by a centralized controller according to their arrival times, which effectively reduces the computational complexity from O(n!) to O(n). Hence, feasible solutions can be generated in real-time, while the performance of solutions cannot be guaranteed since it sets fairness as the fixed goal [38].
In order to obtain the optimal solution, a variety of OP-AIM methods are proposed by researchers, who translate the traffic scheduling problem into a mathematical programming problem and employ optimization-based methods to generate high-performance solutions in terms of delay [39], throughput [40], energy consumption [41], quality of experience [42], and communication overhead [28]. Besides, constraints regarding first-order dynamics limits and inter-vehicle separation distance are well-designed in [40] to ensure safety. Among OP-AIM methods, Mixed-Integer Linear Programming (MILP) models are widely adopted by existing literature to describe the discrete and continuous aspects of the system. For example, passing sequences of vehicles can be formulated by discrete variables, while their states (e.g., location, velocity, and acceleration) must be continuous. Besides, to generate dynamically feasible trajectories, Lu et al. [43] directly set instantaneous speeds of vehicles as time-varying decision variables, considering n × N variables in total, where n and M represent the number of vehicles and time steps, respectively. By contrast, several hierarchical frameworks are developed in [13], [14], [15], [44], [45], [46]. For example, Guo et al. [13] propose a framework of dynamic programming with shooting heuristic as a subroutine (DP-SH) to find the near-optimal solution for the integrated trajectory optimization and intersection control problem. Note that such two-step approaches have smaller search space than the former since they decoupled the optimization of vehicles' crossing timeslots and trajectories. Although these programming models can be solved with off-the-shelf software packages, obtaining solutions for large-scale problems is still time-consuming [29], which limits its application to dynamic traffic conditions. Besides, HR-AIM methods use some heuristic rules to group vehicles into platoons, which accelerates the process of finding a satisfactory solution [23], [25]. Such a strategy can reduce model complexity since the centralized controller only needs to deal with platoon leaders, while the computation and communication load can be released [47]. For example, Miculescu et al. [48] proposed a polling-based method to schedule passing sequences of vehicles, where natural platooning behaviors emerge and help to save switching time. Besides, the authors compared different polling policies (i.e., exhaustive, gated, and k-limited policies) in the MATLAB-based simulations. Tallapragada et al. [49] adopted the k-means algorithm to form clusters based on vehicle positions, while the maximum number of clusters is pre-defined to reduce the computational burden. Ge et al. [50] developed a centralized coordination scheme based on MILP and used graph theory to decompose the original vehicle swarm into small batches after determining the relative priority and speed of vehicles. However, these platoon formation strategies are derived from previous experiences without being optimal, perfect, or rational since they make decisions from the local point of view without considering the global traffic information.
Although different studies use different control strategies, most incorporate the rolling horizon procedure into their method. Therefore, the long-term optimization can be decomposed into a series of subproblems with a shorter planning horizon, which helps to reduce computational complexity. Admittedly, a longer horizon can enable these subproblems to better approximate the long-term optimization problem, thus improving the optimization performance. However, the prediction of dynamic traffic flow is only reliable over a limited horizon. Moreover, a long horizon may increase the scale of the subproblems, making it computationally challenging to solve them. Hence, the horizon length should be carefully selected to balance optimality and computation efficiency. Accordingly, Levin et al. [51] presented a rolling-horizon algorithm to extend the MILP model to larger numbers of vehicles, which takes 5-10 seconds to schedule up to 40 vehicles within a 15-second planning horizon. Mirheli et al. [52] established a mixed-integer non-linear programming model to determine optimal conflict-free reservations in an isolated intersection, which takes more than 10 seconds to schedule 70 vehicles within a 15-second planning horizon. Yao et al. [45] proposed a 0-1 MILP model for vehicles entering scheduling, which takes less than one second to find the optimal scheduling scheme for up to 12 vehicles within a 10-second planning horizon. Ge et al. [50] take about three seconds to schedule 40 vehicles within a 15-second planning horizon. To the best of our knowledge, most existing research cannot solve large-scale AIM problems in real-time (e.g., at the sub-second resolution, which is crucial to reduce tracking errors and ensure vehicles follow the planned trajectories [53]). Hence, it motivates us to develop an efficient traffic scheduling method to pursue the optimal solution in a short time even for high-flow traffic states by integrating the advantages of OP-AIM and HR-AIM.

III. PROBLEM DEFINITION
In this paper, we focus on the cooperative merging problem through a fully autonomous traffic intersection without signal controllers as shown in Fig. 1. It is assumed that all the vehicles are CAVs equipped with wireless communicators (e.g., Dedicated Short Range Communication (DSRC) and cellular devices) to support vehicular communications [54], [55], [56], while their kinematic states (e.g., position, speed, and acceleration) can be precisely estimated by Kalman filters through fusing GPS, camera, and IMU as illustrated in [57], [58], and [59]. This simple scenario consisting of two one-way traffic streams is defined as a standard test scenario by [12], [44], [45], [48], [60], and [61]. It is chosen to gain insights into the benefits of platoon-aware traffic scheduling for automated traffic operations, which serves as a building block for subsequent studies. Besides, we assume that Base Stations (BSs) and Road Side Units (RSUs) are responsible for providing cellular-and DSRC-based V2X networks outside and within the control region, respectively. Thus, the kinematic states of vehicles can be captured by the decision-maker via the cellular network before they enter the control zone [62], such that their  arrival time can be predicted ahead of time. After that, the RSU, which serves as a centralized controller, can control the motion of vehicles via wireless communication.
Given the planning horizon t ∈ [T 0 , T 0 + T ], we assume that there are n k vehicles existing on the road k ∈ K = {0, 1}, which can be denoted by V k,i and indexed by i ∈ I = {1, 2, · · · , n k } according to their sequence of arrival. Besides, the overlapped region of two roads is called the merging zone and has a width of W , while the portion of the road segment within a distance of L to the merging zone is called the control zone. It is assumed that vehicles are required to form m k platoons within the control zone (m k ≤ n k ), which can be denoted by P k,i and indexed by j ∈ J = {1, 2, · · · , m k }. Without loss of generality, each vehicle is represented by a two-dimensional and rectangular rigid body with length l and width w. Besides, suppose each vehicle is subject to second-order dynamics of the following form: where s t k,i denotes the displacement of the vehicle's front bumper between the entrance of the control zone and current position,ṡ t k,i the velocity, and a t k,i the acceleration. During the time when each vehicle passes through the control zone and merging zone, there are three special moments as shown in Fig. 2. In detail, (1) the arrival time t + k,i is the moment when its front bumper enters the control zone where s t k,i = 0; (2) the start time t * k,i is the moment when its front bumper enters the merging zone (i.e., crosses the stop line) where s t k,i = L; and (3) the departure time t − k,i is the moment when its rear bumper leaves the merging zone where s t k,i = L + W + l. It is obvious that t − k,i > t + k,i for all vehicles. After that, we define the interval of these three moments as the (1) waiting time t +, * k, With these notations in hand, we define safety as follows.

Definition 1 (Safety):
The control zone is said to be safe at time t, if each vehicle maintains a sufficient separation with its preceding vehicle, i.e., s t k,i−1 − s t k,i > l for all k ∈ K = {0, 1} when i > 1. Besides, the merging zone is said to be safe, if there are no pairwise collisions among vehicles, i.e., From a practical point of view, there should be an extra separation between vehicles when they enter the merging zone in case of communication latency and control error. Hence, we define the switching time on the occupation timeline of the merging zone as shown in Fig. 3, which should be greater than a minimum threshold value for security.

Definition 2 (Switching Time):
In the merging zone, the switching time between vehicles coming from the same road is defined as t In this paper, we formulate the assumption of sequence-dependent switching time t k ,i k,i as follows since the vehicle platooning technique helps to reduce inter-vehicular distances considerably [25].
Assumption 1 (Sequence-Dependent Minimum Switching Time): We assume that the minimum switching time t k ,i k,i of vehicles in the merging zone is varied with each pair of individuals, which can be calculated by , if k = k and same platoon, where τ denotes the minimum time-headway of vehicles in the same platoon on the same road, σ 1 the factor of safety when vehicles are assigned into different platoons on the same road, and σ 2 the factor of safety when vehicles come from different roads (σ 2 > σ 1 > 1).
Given the departure velocity v − , the delay of an individual vehicle can be calculated by the difference between its actual departure time t − k,i and virtual departure time, i.e., In detail, the latter one is the time in ideal driving scenarios where the vehicle travels through the control and merging zone under free-flow 1 conditions.

Definition 3 (Delay):
The delay of vehicle is defined as Note that the maximum delay among vehicles reflects the fairness of the traffic scheduling scheme. In other words, the smaller the maximum delay, the smaller the variance of travel times, and the better the traffic scheduling scheme in terms of fairness.
Next, we define the makespan of a given traffic scheduling scheme as the departure time of the last vehicle within the current horizon. It is the period of time that the scheduling scheme took to let all the vehicles pass through the intersection area, reflecting the operational efficiency of the underlying control policy.
Definition 4 (Makespan): Given a set of vehicles denoted by V k,i where k ∈ K = {0, 1} and i ∈ {1, 2, · · · , n k }, the makespan of a traffic scheduling scheme is defined as As illustrated in Fig. 4, the traffic spillback refers to the situation when the back of the queue propagates to the previous intersection, leading to extended blockages and queue overflowing buffers. It is evident that spillbacks occur because vehicles cannot pass through the merging zone on schedule, e.g., due to the sudden increase in traffic volume or inefficient traffic management methods.
In this paper, we show that the makespan is a helpful indicator to predict the time when spillbacks occur. To demonstrate this, we first establish a condition in Lemma 1 such that the violation of this condition will cause queue to constantly grow, and hence spillback will eventually happen. Based on this condition, we then calculate in Property 1 the estimated number of rolling horizon cycles without the occurrence of spillbacks.
Lemma 1: For a given planning horizon t ∈ [T 0 , T 0 + T ], let us assume that T min is the minimum time interval between a vehicle's arrival and start time in each horizon, the value of which is affected by the trajectory planning algorithm. Let C max be the makespan of the current traffic scheduling scheme. Then, the queue will not be carried over from the current horizon to the next one if Proof: Let us consider two adjacent planning horizons, i.e., [T 0 , T 0 + T ] and [T 0 + T, T 0 + 2T ]. The ideal earliest moment when the front bumper of the first vehicle in the next horizon enters the merging zone can be denoted by T 0 + T + T min , the value of which is determined by the Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. trajectory planning algorithm without considering the collision avoidance constraint. In addition, the moment when the rear bumper of the last vehicle in the previous horizon leaves the merging zone can be denoted by T 0 + C max . This is the actual earliest available time for vehicles in the next horizon to enter the merging zone that ensures safety.
It is obvious that if the actual earliest available moment (which is affected by vehicles in the previous horizon) is less than the ideal one, the queue will not be carried over from the previous horizon to the current one. Therefore, we have This completes the proof.
In this paper, we try to avoid spillbacks in normal traffic states where traffic can be effectively managed by minimizing the makespan of optimized traffic scheduling schemes and preventing delay propagation. Moreover, we try to mitigate spillbacks (i.e., postpone the occurrence of spillbacks) in saturated traffic states. To this end, we provide an upper bound for the number of rolling horizon cycles without the occurrence of spillbacks to calculate the maximum duration for which it can disperse traffic without causing a spillback.
Property 1 (Number of Rolling Horizon Cycles Without the Occurrence of Spillbacks): Suppose L * denotes the remaining road storage space, which is determined by the number of vehicles that arrived in advance but still exist on the road, i.e., L * = L − L , where L denotes the length of queue. Let d be the minimum inter-vehicle distance and h be vehicles' arrival intensity in seconds per vehicle (s/veh), which denotes the average headway. Then, in scenarios where the arrival flow is slowly changing, a spillback is expected to occur after N cycles of rolling horizons, where Recall that L and l denote the length of the road and vehicles, T the length of the planning horizon, T min the minimum travel time of vehicles, C max the makespan of the traffic scheduling scheme in the current horizon. According to Lemma 1, it is clear that the queue will not be carried over from the current horizon to the next one if C max ≤ T + T min . On the contrary, the queue will be carried over to the next horizon if C max > T + T min as shown in Fig. 5, which may cause delays propagating backward and accumulating, eventually leading to congestion and spillbacks. Note that spillbacks are expected to occur when the number of queued vehicles exceeds the maximum road storage (i.e., L l+d ). Then, the time when the road storage reaches its limit can be

IV. VEHICLE-PLATOON-AWARE BI-LEVEL OPTIMIZATION FOR AUTONOMOUS INTERSECTION MANAGEMENT A. Method Framework
In this section, we propose a bilevel optimization algorithm, named VPA-AIM, which assigns vehicles to platoons to get through the unsignalized intersection cooperatively. Efficient coordination is achieved by solving two coupled optimization models. In detail, the upper-level task is referred to as platoon-aware traffic scheduling, which determines the optimal platoon formation scheme of vehicles during the control zone and allocates the sequence and timeslots of platoons passing through the merging zone. The lower-level task is referred to as multi-vehicle trajectory optimization, which designs referenced trajectories for multiple vehicles during the process of vehicles merging into platoons and passing through the control and merging zones. The two models are connected to each other via several coupling constraints, which ensures that the decisions generated by the upper-level traffic scheduling model are feasible for the lower-level trajectory planning model. The workflow of the VPA-AIM algorithm is presented in Fig. 6.
This paper proposes a rolling horizon procedure to deal with this online optimization problem since the prediction of dynamic traffic flows is reliable only over a limited horizon. This procedure decomposes the long-term optimization into a series of subproblems with a shorter time horizon, such that the computation time can be reduced. According to this method, a vehicle scheduling scheme is first determined up to a given point in time, ignoring everything that could happen afterward. After that, the planning interval is iteratively moved forward in time to generate the following schedule and so on.
In each decision step, the traffic scheduling scheme of vehicles (i.e., their departure time, which includes information about the platoon formation scheme) is optimized in the upper-level model to minimize the makespan and the maximum delay. Besides, referenced trajectories of vehicles are designed and optimized in the lower-level model according to the pre-determined arrival and departure time to reduce the overall energy consumption of vehicles and improve space utilization to prevent spillbacks within the intersection area.

B. Upper-Level Traffic Scheduling Model 1) Decision Variables:
Recall that there are n k vehicles existing on the road k ∈ K = {0, 1}, which are indexed by i ∈ I = {1, 2, · · · , n k }. These vehicles are required to be assigned into m k platoons indexed by j ∈ {1, 2, · · · , m k }(m k ≤ n k ). Therefore, we define the decision variables as the start time of vehicles, i.e., t * = {t * k,i } k∈K,i∈I . Besides, we characterize platoon formation using a dummy binary variable x k i that represents whether vehicle V k,i is the leading vehicle of a platoon. In other words, we have which can be expressed as a n k -dimensional vector x k for all i ∈ I.
which can be inferred according to the value of x k i . For example, suppose there are five vehicles (n k =5) on the road k indexed by {1, 2, 3, 4, 5} and they are assigned into three groups (m k = 3), i.e., {1, 2}, {3}, {4, 5}. In this case, V k,1 and V k,2 are assigned into the first platoon, V k,3 the second, V k, 4 and V k,5 the third. In other words, there are two, one, and two vehicles in each platoon, respectively. Therefore, we have y k = {2, 1, 2}.
2) Constraints: Similar to [15], [33], [61], and [63], we assume overtaking is not allowed in the control zone. Hence, the passing sequence of vehicles through the merging zone on the same road should correspond to the order of arrival at the control zone. In addition, there should be an extra separation between vehicles when they enter the merging zone to avoid rear-end collisions, which is referred to as the switching time t k ,i k,i as described in Definition 2 and Assumption 1. Therefore, we have the following precedence constraints in the form of chains for vehicles on the same road: To avoid side-impact collisions in the merging zone, the start time of two vehicles coming from different roads should also maintain a separation. Therefore, we have where z i,i describes the sequence of vehicles passing through the merging zone as stated in Eq. (8). It takes only the value of 1 or 0 to choose one of these two equations to activate. For example, suppose t * 0,1 > t * 1,1 which suggests that V 0,1 pass through the intersection after V 1,1 , then we have z 1,1 = 1. In that case, Eq. (10) will be activated and Eq. (11) will be disabled.
In addition, the ranges of decision variables and intermediate variables are restricted by the following coupling constraints explicitly dependent on the lower-level model: where T min and T max denote the minimum and maximum time interval between arrival time t + k,i and start time t * k,i ; M max denotes the maximum number of vehicles in a platoon. Note that T max is used to prevent vehicles from stopping (i.e., to avoid the undesirable phenomenon of stop-and-go traffic), whose value is affected by the motion pattern defined in the lower-level trajectory planning model. Besides, the value of T min is also determined by the lower-level model since the maximum speed and acceleration of vehicles are bounded by traffic regulations and mechanical limitations, respectively. As suggested by [64], a larger maximum allowable platoon size helps to increase roadway capacity, but may introduce greater difficulties in maintaining string stability. Therefore, the maximum platoon size M max should be limited to a moderate value to trade off capacity and maneuverability.
3) Objective Functions: In the upper-level model, the passing sequence and timeslots of vehicles are jointly optimized with the platoon configuration scheme by virtue of real-time traffic states to improve operational efficiency (i.e., the makespan) and fairness (i.e., the maximum delay). Hence, we formulate a bi-criteria optimization problem that simultaneously considers the makespan of the traffic scheduling scheme and the maximum delay of all the vehicles under constraints (7) and (9)(10)(11)(12)(13). Moreover, we conduct the lexicographic optimization on these objectives, the preferences of which are imposed by ordering objective functions according to their importance, rather than by assigning weights. In other words, the makespan is considered the chief objective while the maximum delay is subordinated, since the former has a direct influence on the occurrence of spillbacks, according to Eq. (3). Specifically, we have min where t * = {t * k,i } k∈K,i∈I represents the decision variables, i.e., the start times of vehicles. Additionally, f 1 (t * ) and f 2 (t * ) denote the makespan and the maximum delay, respectively.

4) Solution Algorithm:
It is challenging to solve the traffic scheduling model in real-time through classical mathematical programming methods (e.g., MILP) due to the complexity of the merging coordination problem.
Fortunately, the latest released commercial Constraint Programming (CP) solvers (e.g., IBM ILOG CPLEX CP Optimizer [65] and Google's OR-Tools) offer us an opportunity to employ out-of-the-box algorithms to solve combinatorial optimization problems. The main advantage of CP is it can deal with complex problems with much more decision variables within limited runtime budgets while ensuring optimality. Especially, the standard CP solver provided by IBM ILOG CPLEX has shown to provide excellent performance in various scheduling problems (e.g., vehicle routing [66], train scheduling [67], drone scheduling [68], and airline management [69] problems). However, to the best of our knowledge, CP has not been used to solve the intersection management problem yet. Hence, this paper wants to leverage the advantages of this technique and ventures to pursue the global-optimal solution within the tight runtime budget, contributing a novel application of CP to traffic scheduling problems.
The CP solver works by iteratively trying different combinations of values for variables in the problem, checking to see if they satisfy all the constraints, and backtracking if they do not. Additionally, the CP solver uses a search strategy that is designed to explore all the space of solutions in an efficient way, and to prune branches of the search tree that are not likely to contain solutions. This process continues until the optimal solution is found or it is determined that no solution exists. We refer readers to [65] for a detailed explanation of the underlying optimization principle.

C. Revisit of the Lower-Level Trajectory Planning Model
In this section, we first revisit the trajectory planning method developed in our previous studies [70], [71], where a Hybrid Evolutionary Algorithm with Cooperative Coevolution (HEA-CC) is proposed and proved to be efficient to deal with the multi-vehicle trajectory optimization. Furthermore, we formulate a new objective function to fit the setting of this problem, which not only reduces the overall energy consumption of vehicles but also improves the usage of road space to avoid/mitigate spillbacks. This task is non-trivial since vehicles with non-uniform initial separation distance are required to merge into platoons while complex dynamics and safety-critical constraints must be satisfied all the time.
It is noteworthy that the arrival time t + k,i and departure time t − k,i of vehicles can be known if the optimal traffic scheduling  scheme has been generated by the upper-level model. Hence, these two kinds of parameters are defined as the input of the lower-level trajectory planning algorithm. As shown in Fig. 7, we are required to design referenced trajectories for vehicles according to the given platoon formation scheme and the allocated passing sequence and timeslots. In this paper, we adopt the HEA-CC algorithm as a building block in the proposed bi-level optimization approach to solve the trajectory planning problem. Besides, it is assumed that all the vehicles have to adjust their speeds to an identical value before entering the control zone such that their separation distances are stable. This setting can be achieved by a speed harmonization operation in advance [72], [73]. In addition, we assume that the motions of vehicles can be controlled by the platoon leader who serves as a centralized decisionmaker, where the predecessor-leader-following communication topology [74] and model predictive control approach [75], [76] are used to maintain the desired inter-vehicle distance and velocity according to the referenced trajectories.
Unlike the well-known and widely-used shooting heuristic algorithm in the literature [77], [78], [79], our trajectory planning method requires less predetermined rules about the motion patterns while having a higher degree of freedom since the speed profiles of trajectories are configured by the optimization algorithm appropriately. To be specific, each trajectory is encoded by a set of via-points named knots as shown in Fig. 8, whose number varies automatically to balance the computational efficiency and sufficient degrees of freedom. Hence, the original trajectory planning problem can be converted to a constrained numerical optimization problem. The complete shape of trajectories can be easily constructed for performance evaluation by cubic spline interpolation.
After that, we formulate a new objective function for trajectory optimization, which not only reduces the energy consumption of vehicles but also improves the usage of road space to avoid/mitigate spillbacks with the consideration of dynamics and safety constraints. For a given planning horizon t ∈ [T 0 , T 0 + T ], we can discretize it into evenly-distributed time points by t which denotes the interval of time for decision-making. Therefore, we have t ξ = T 0 + ξt, where t 0 = T 0 , t M = T 0 + T , and ξ = {0, 1, · · · , M}.
Here, s t ξ i denotes the displacement of the front bumper of vehicle i ∈ [1, n k ] at time t ξ , P i (t ξ ) denotes the instantaneous power, For the second term, g c (s t ξ i ) denotes the penalty function with the corresponding weight coefficient σ c , where g 1 (·) measures the violation of traffic regulations, g 2 (·) the mechanical limitation, g 3 (·) the comfort criteria, and g 4 (·) the separation distance. We refer readers to read [70] for the calculation of these energy and penalty functions.
Additionally, the last term in Eq. (16) denotes the sum of the area above trajectory curves, where L is the length of road and α is the coefficient used to trade-off between the objectives of eco-driving and spillback avoidance. It is clear that for given start times t * k,i , increasing the sum of the area under trajectory curves (i.e., the dark area shown in Fig. 8) can reserve more space for subsequent vehicles, leading to a smaller length of queue (i.e., L ), which helps to mitigate the occurrence of spillbacks according to Property 1.
In the next step, the set of trajectories within the same platoon are calculated in parallel using a divide-and-conquer strategy until a near-optimal solution has been found, or a terminal condition is met. Note that the HEA-CC algorithm is able to generate feasible referenced trajectories with high energy efficiency in a sub-second computation time, which is at least two orders of magnitude lower than the planning period even for large-scale instances.

D. Discussion of the Extension to Mixed Traffic
It is noteworthy that the proposed intersection coordination algorithm can be readily extended to mixed traffic scenarios with various penetration rates of CAVs. Next, we will briefly explain how to address two major challenges associated with the mixed traffic condition.
On the one hand, without the help of traffic signals, the centralized controller located on the roadside cannot force Human-Driven Vehicles (HDVs) to comply with the advised passing sequence, especially if the leading vehicles of both approaches are HDVs. Therefore, one solution is to modify the proposed algorithm to preclude such scenarios by requiring all platoon leaders to be CAVs, which, however, may not be feasible if the penetration rates of CAVs are not sufficiently high. Nevertheless, we notice that in current practice, unsignalized intersections with pure HDV traffic are coordinated using the FIFO policy to provide right-of-way to the HDV that arrives earlier. Inspired by this, we can integrate the proposed algorithm with the classical FIFO policy to coordinate mixed traffic and deal with specific scenarios where HDVs are platooned.
On the other hand, it is difficult to predict and control the behaviors and motions of HDVs. To address this challenge, many previous works used the classic Optimal Velocity Model [80] and Intelligent Driver Model [81] to estimate the behaviors of HDVs. However, such deterministic car-following models impose many assumptions on vehicles' desired velocity and spacing time, which require elaborate tuning of parameters and may not be favorable under uncertain conditions. Some other researchers try to perform evaluations against HDVs uncertainties with the least-assumption-based approach [63] that fixes the velocity of HDVs as constants, which is however not the setting of this problem.
Fortunately, Advanced Driver-Assistance System (ADAS) enables the centralized controller to issue proper minimum and maximum speed recommendations to HDV drivers via Human Machine Interface (HMI) [82]. In that case, HDVs will be able to track the desired speed and trajectory such that they can pass through the merging zone in their pre-determined timeslots to avoid collisions. Hence, the proposed bi-level intersection coordination algorithm can be extended to mixed traffic conditions with the Assumption 2.
Assumption 2 (Tracking errors of HDV trajectories): This paper assumes that HDVs are able to track the speed profile provided by the proposed algorithm with bounded tracking errors, i.e., where t * denotes the expected time-headway of a HDV with its preceding or succeeding vehicle, t real the actual time-headway, and t max the maximum tracking errors of HDVs. In that case, other vehicles should keep a sufficient inter-vehicle distance with the maximum tracking error boundaries of HDVs to ensure safety. Accordingly, the upper-level traffic scheduling model should also take tracking errors into consideration by reserving more space for HDVs to ensure safety when it assigns timeslots for vehicles.
An example of the generated trajectory planning scheme in the mixed traffic condition is presented in Fig. 9, where the vehicle platoon consists of six CAVs marked in blue and one HDV marked in red. It can be seen that the trajectories of CAVs are deterministic, while the trajectory of HDV is uncertain since the driver behavior cannot be precisely controlled. However, the tracking errors of the HDV are bounded within certain limits (i.e., the area in red), which have been considered in advance in the trajectory planning phase to ensure safety. In other words, the other CAVs would leave enough space to prevent confusion due to the control errors of HDVs.

V. NUMERICAL SIMULATION EXPERIMENTS
In this section, we formulate a micro-simulation platform in MATLAB 2021b to evaluate the proposed VPA-AIM algorithm. The simulation parameters are set as shown in Table I. The basic setting in simulations includes the platooning willingness of vehicles and their ability to communicate with others. The appropriate length of the control region depends on the practical needs of applications, but it cannot exceed the reliable communication range of the intersection manager located nearby the merging zone. In this paper, we set the length of the control zone as 150 meters, similar to existing studies [41], [48], [78]. Note that the minimum time-headway can be reduced if the latency of vehicular communication is stably maintained at a low level. Hence, we determine the two values according to [45], [83], [84], and [85]. The value of the maximum tracking errors of HDVs depends on a wide range of factors, including the skill of drivers, the type of vehicles, and the traffic conditions. In this paper, we set this value as one-quarter of a second according to the benchmark average human response time [86]. In addition, we set the arrival and departure speeds to 60 km/h according to the free flow speed used in [12] and limit the maximum speed to about 80 km/h to ensure platooning safety as suggested by [84]. The range of acceleration and jerk are limited within the comfort criteria recommended by [87]. The maximum platoon size M max is set to 25 that conforms with the platoon size configuration in previous studies [77], [88].

A. Design of Test Scenarios
The vehicle arrival time t + k,i is modeled as a Matérn hardcore stochastic point process [48] on the non-negative real line. Additionally, the distance between points is lower bounded by a certain number (i.e., t + k+1,i − t + k,i ≥ t k ,i k,i ), which denotes the sequence-dependent minimum switching time of vehicles in the same platoon. This setting guarantees that no two vehicles are in collision at the time of arrival. Define Matérn process with parameter λ as the Matérn process obtained by thinning a Poisson process [89] in the line that has intensity λ. Suppose t = l/V max , then the intensity of Matérn Type I point process can be defined as In order to generate multiple scenarios with different arrival flows within a reasonable range, we define the road capacity as the maximum flow of vehicles on a road segment. Given the minimum headway τ , the road capacity can be calculated by 1/τ .
Next, we generate nine groups of test instances whose arrival flows vary from 720 to 3600 vph/lane using the Matérn hard-core stochastic point process in the scenario of balanced and uniform traffic flow. Meanwhile, each group of traffic scenarios is simulated five times with different random seeds to replicate stochasticity in transportation systems, where the arrival time of vehicles is different, but the arrival flow keeps the same. In other words, there are 9 ×5 = 45 scenarios tested in total.

B. Computational Results
In this section, the proposed VPA-AIM method is tested in multiple scenarios with different arrival flows and compared with the FIFO-, polling-, and scheduling-based methods for AIM. The FIFO-based method is modified from [83], which determines the passing sequence of vehicles according to their arrival orders. The polling-based method is modified from [48], which switches the priority of vehicles coming from different directions using an exhaustive policy. The traditional scheduling-based method is modified from [45], which solves a linear programming model to determine the passing sequence of individual vehicles without considering grouping vehicles into platoons.
The computational results for different scenarios are reported in Table II. Note that the data shown in this table denotes the average value of five scenarios, while the average computation time for the upper-level traffic scheduling model is presented in the last column. To quantify the optimization performance of different algorithms, we define the Relative Percent Deviation (RPD) as below. Suppose f i and f i denote the indicator obtained by the proposed and comparison algorithms, then we have It can be seen from Table II that the proposed method outperforms the other benchmark methods for minimizing both the makespan and maximum delay in all scenarios tested.  The makespan and maximum delay can be reduced by up to 24.2% and 34.6%, respectively. Besides, the proposed method maintains the maximum delay among vehicles below 8 seconds for all scenarios, ensuring fairness in passing the intersection. Furthermore, the performances of all these methods are affected by the arrival flow, as shown in Fig. 10. First, it can be seen that all these methods perform similarly when the arrival flow is low (e.g., from 720 to 1080 vph/lane) since vehicles are able to pass the intersection without additional intervention. In other words, there is little room for optimization in these cases. However, the makespan and the maximum delay of solutions obtained by the FIFO-, polling-, and scheduling-based methods increase dramatically with the arrival flow (i.e., from 1440 to 1800 vph/lane). Second, it is observed that either the FIFO-or polling-based methods cannot trade off multiple objectives. For example, the FIFO-based method is inferior to the polling policy in terms of efficiency (i.e., makespan) but outperforms the latter in terms of fairness (i.e., maximum delay). Third, it can be seen that the scheduling-based method overmatches FIFOand polling-based methods in both objectives within low-flow traffic states. However, its performance has weakened to a level similar to the polling-based method in medium and high-flow traffic states, which suggests that the traditional scheduling method has limitations and cannot work stably in all traffic scenarios.
By contrast, the proposed VPA-AIM algorithm performs better than all the benchmark methods. Moreover, the proposed algorithm keeps the makespan of the traffic scheduling scheme below the threshold value when the arrival flow is not greater than 1800 vph/lane. As shown in Table II, when the arrival flow over 1440 vph/lane, the makespans of FIFO-, polling-, and scheduling-based methods exceed T + T min = 20 + 9 = 29 s, which is the threshold value for ensuring that spillbacks will not occur as stated in Eq. (3). If the makespan of the current traffic scheduling scheme exceeds this threshold value, the queue will be carried over from the current horizon to the next one, which may cause delays to propagate backward and accumulate, eventually leading to congestion and spillbacks. Although the resulting makespan of our method may exceed the threshold value in over-saturated scenarios, it does not exceed that value too much and can remain stable for a longer period of time than those of benchmark methods, and thus our method is more efficient in avoiding spillbacks and mitigating congestion.
According to Property 1, the number of rolling horizon cycles without the occurrence of spillbacks is shown in Fig. 10. It can be seen that the proposed VPA-AIM algorithm can avoid spillbacks in normal traffic states (when the arrival flow is not greater than 2520 vph/lane). Moreover, the proposed method provides a mitigation mechanism to regulate spillbacks caused by extreme arrival flows (i.e, 2880-3600 vph/lane). Although the propagation of delay cannot be avoided in these situations, the number of rolling horizon cycles without the occurrence of spillbacks resulting from the proposed method is larger than the benchmark methods, which suggests that it can disperse traffic for a longer time.

C. Traffic Scheduling and Trajectory Design Schemes
The differences in traffic scheduling schemes obtained by different methods in different arrival flows are demonstrated in Fig. 11 using the conflict-duration-graph developed in our previous study [37]. In short, the conflict-duration graph is used to denote the occupation state of the merging zone, where each block denotes the periods occupied by a vehicle. Besides, the left and right endpoints of each block correspond with the start time t * k,i and departure time t + k,i of vehicles, respectively. Additionally, the interval between two blocks denotes the separation in time of two vehicles to ensure safety, whose minimum value is decided by the sequence-dependent  minimum switching time. Note that it is non-trivial and challenging to calculate the optimal platoon formation scheme and passing sequence and time, which cannot be obtained by heuristic rules since multiple constraints need to be satisfied simultaneously.
The optimized trajectories for vehicles obtained by the proposed method are shown in Fig. 12. The upper and lower half of each plot depicts the trajectories of lane 1 and lane 0, respectively, while the vertical axis describes the distance of vehicles from the merging zone. First, it can be seen that the platoon configuration is not always consistent, but vary in different situations. This happens because the platoon formation scheme is designed by the optimization model appropriately according to real-time traffic states. Hence, more vehicles are assigned to a platoon when the arrival flow increases. Second, the generated trajectories are smooth enough without illegal speed jumps, while there is enough space between trajectories to prevent collisions. Hence, vehicles can pass the intersection collaboratively without stopping with the support of well-designed traffic scheduling and trajectory planning schemes. Third, it can be observed that the number of vehicles assigned to a platoon increases with the arrival flow. In highflow traffic states, the right-of-way will be given to groups of vehicles in two lanes alternately with different lengths of time. This is similar to the phenomenon at signalized intersections with an adaptive signal controller [90], [91], [92]. Hence, when it comes to signalized intersections, the proposed algorithm can be extended and used to design signal timing plans that determine the optimal green time of each lane. Moreover, the proposed algorithm can generate smooth trajectories for vehicles and cluster them into platoons that can properly use the green light windows and pass the intersection at high speed to avoid stop-and-go movements.
It can be seen from Fig. 13 that the CP solver can find the optimal traffic scheduling scheme for up to 32 vehicles with a sub-second computation time using a 20-second time horizon. However, the complexity and difficulty of problem-solving grow with the number of vehicles, leading to an exponentially increased computation time. For those scenarios with more than 36 vehicles to be controlled, a smaller planning horizon is suggested to reduce the number of decision variables such that the computation time can be maintained within the subsecond level. Otherwise, the CP solver may not be able to find the optimal solution within the limited runtime budget. From the practical point of view, the classical FIFO policy can be employed as an alternative for those scenarios without sufficient time to optimize or cannot find any feasible solutions. Additionally, recall that we incorporate a coupled constraint i,e, Eq.(12), derived from the lower-level trajectory planning algorithm into the upper-level scheduling model, which is used to prevent vehicles from stopping. This constraint can be relaxed (i.e., increase the value of T max ) if the solver cannot find a feasible solution since it is possible that vehicles have to stop and wait before the stop line of intersections at extremely high-flow traffic states.

D. Mixed Traffic Scenarios With Diverse Autonomy Levels
In order to analyze the impact of different CAV penetration rates on the performance of the proposed method, we generate 9 × 9 groups of test scenarios whose arrival flows vary from 720 to 3600 vph/lane and penetration rates vary from 100% to 20%. Meanwhile, each group of traffic scenarios is simulated five times with different random seeds. To quantify the robustness of the proposed method in different arrival flows and penetration rates, we define the Average Performance Degradation (APD) as below. Let i index the penetration rate of CAVs and j index the arrival flow, we can denote the performance indicator (i.e., the makespan and maximum delay) under arbitrary conditions and a specific condition with 100% penetration rate by f i, j and f * j , respectively. Then, we have It can be seen from Fig. 14 that, compared with the pure autonomy scenarios, both performance indicators (i.e., the makespan and the maximum delay) experience degradation when the penetration rate decreases in all kinds of arrival flow. This happens because HDVs require much more space than CAVs to ensure safety, leading to heavier burdens for the limited road resource. Additionally, the requirement that platoon leaders be CAVs imposes additional constraints on the optimization problem, leading to performance degradation of the optimal solution. Apart from this, it is also observed that the performance degradation in terms of the makespan becomes more significant when the arrival flow increases, the maximum value of which reaches 60% when the arrival flow and penetration rate becomes 3600 vph/lane and 20%, respectively. By contrast, the performance degradation in terms of the maximum delay reaches 80% when the arrival flow and penetration rate becomes 2160 vph/lane and 20%, respectively.
An example of the traffic scheduling and trajectory planning scheme in the mixed traffic scenario with dynamic traffic demands is illustrated in Fig. 15. Recall that HDVs are able to track the desired speed profile with limited tracking errors according to the Assumption 2. It can be seen that the upper-level scheduling model tends to assign more space for HDVs than CAVs to pass through the merging zone while avoiding collisions. Meanwhile, the lower-level trajectory planner is able to estimate the maximum tracking error boundaries of HDVs. Further, the planner will adjust the trajectories of CAVs such that they can maintain a safe separation distance from those HDVs nearby to ensure safety. Simulation results indicate that the proposed algorithm is  sufficiently robust to handle mixed autonomy conditions involving CAVs and HDVs with dynamic traffic demands.

VI. CONCLUSION
This paper proposes a bilevel optimization algorithm, named VPA-AIM, to coordinate the merging of CAVs at unsignalized intersections. In this work, the upper-level task is referred to as platoon-aware traffic scheduling, which determines the platoon formation scheme of vehicles during the control zone and allocates the passing sequence and timeslots of platoons through the merging zone, with the goal of reducing the makespan and maximum delay to enhance operational efficiency and fairness. The lower-level task is referred to as multi-vehicle trajectory planning, which is about designing referenced trajectories for vehicles during the process of passing through the control and merging zones, with the goal of reducing overall energy consumption and avoiding/mitigating spillbacks. The two levels are coupled by dynamics constraints and the arrival and departure times of vehicles. The bi-level optimization framework optimizes traffic operation within a rolling horizon to balance traffic performance and computational efficiency. At each decision step, the optimal scheduling scheme at the upper level can be solved by the CPLEX Constraint Programming solver with sub-second computation time, while the near-optimal trajectories in the lower level can be generated by the coevolutionary algorithm modified from our previous work [70], [71]. In addition, this paper establishes the quantifiable connection between the makespan of the traffic scheduling scheme and the occurrence of spillbacks, demonstrating that the optimization of the makespan helps to avoid/mitigate spillbacks in normal/saturated traffic states. It is also noteworthy that the proposed algorithm can be extended to mixed traffic scenarios with various penetration rates of CAVs based on the assumptions that HDVs have bounded tracking errors and CAVs serve as platoon leaders.
The proposed VPA-AIM algorithm is tested in multiple scenarios with various arrival flows (i.e., 720-3600 vph/lane) and compared with FIFO-, polling-, and scheduling-based methods in MATLAB. Extensive numerical examples suggest that the proposed algorithm outperforms benchmark methods in terms of efficiency and fairness. For example, the makespan and maximum delay of traffic scheduling schemes can be reduced by up to 24.2% and 34.6%, respectively. Besides, the proposed algorithm remains stable in all scenarios tested, while the performance of benchmark methods degrades dramatically with the increase in the arrival flow. Moreover, it has the largest number of rolling horizon cycles without the occurrence of spillbacks among benchmark methods, which suggests that it can mitigate spillbacks even in saturated traffic states.
This study opens several directions for future work. First, we would like to further generalize the proposed algorithm to a complex intersection with multiple approaches, and conduct comprehensive experimental validation using laboratory-scale Arduino cars. Second, we will extend the algorithm to urban arterial intersections or networks and devise signal coordination strategies to enable green waves with consideration of spillbacks. Third, we will consider intersection control with multiple types of road users such as transit vehicles that require priority and pedestrians that could impose additional randomness on the system dynamics. Finally, we would like to leverage the interactive protocol proposed by [93] to establish a verifiable communication scheme between authorized intersection controllers and vehicles, which helps to ensure the authenticity, integrity, and confidentiality of the mobility data and control commands being transmitted, and thus reduces the risk of data privacy and cybersecurity issues.