Velocity Obstacles and Emergent Rules-of-the-Air for Autonomous Drone Traﬃc Management

As the use of unmanned aerial vehicles (UAVs) becomes ever more widespread there is a growing need to develop traﬃc management and ﬂight rules, in particular for autonomous UAVs or where the predicted traﬃc densities far exceed those of traditional manned aviation. Inspired by the current rules of the air and multi-agent systems (e.g., pedestrians and swarm robotics) we outline a set of ﬂight rules for autonomous UAVs that consist of waypoint following and conﬂict avoidance schemes. These ﬂight rules are then explored in small, pairwise simulations and thus reﬁned to allow a UAV to choose from three potential avoidance behaviors based on it and its neighbors velocities and positions. Finally we compare the original and modiﬁed ﬂight rules in larger scale simulations modelling two streams of UAV traﬃc crossing at a point. We show that the modiﬁed rules signiﬁcantly reduce the mean transit time by reducing the impact of UAVs avoiding other UAVs from the same stream.


Introduction
Unmanned Aerial Vehicles (UAVs) are poised to become an integral part of the fourth industrial revolution with the global UAV service market predicted to be worth $63.6 billion by 2025 [1]. Amongst a variety of civilian applications [2], goods delivery has attracted a great deal of media attention, for example a recent article [3] about drone delivery service Manna operating in Ireland. In future, these services will require 5 unmanned traffic management (UTM) systems, and there are a range of proposals [4,5,6] for such systems based on UTM service providers that will schedule and share flight plans so as to avoid conflicts (i.e., maintain safe separation) between UAVs. Future demand modelling is highly uncertain, but some projections suggest very large numbers of autonomous craft in the air at once (e.g., 87,000 delivery drones on average operating above Paris by 2035 [7] or 32,887 packages a day deliverable by drone for the city of Sendai [8]). In these 10 circumstances, a centralised UTM approach would seem infeasible.
Therefore, we propose a distributed approach to UTM, where UAVs are dispatched without centralised trajectory planning, but rather use autonomous sense-and-avoid capabilities to avoid conflicts [9,10,11]. The problem is thus similar to a range of other multi-agent applications, for example Reynolds flocks [12] and pedestrian flow [13] where many agents must navigate through an environment and each other. A capability is assumed independent of the velocity, and there is no minimum stall speed, i.e., agents may hover. The rules of motion are set out in detail in Section 2. 25 In Section 3 then we outline an experimental setup consisting of two agents, which implement the flight rules in order to avoid each other, and detail a method to measure the performance of the flight rules in terms of the delay an agent incurs while performing the avoidance maneuvers. The results of simulations, swept over various parameter settings, are then presented and show that the 'turn to the right' rule can produce large delays for one of the agents when the angle of approach between them is small. 30 As a result, Section 4 explores similar simulation experiments but with one of the agents deviating from the 'turn to the right' rule by either turning to the left or not undergoing avoidance maneuvers. We show then that the overall delay can be improved by allowing one of the agents to deviate and use this to develop a hybrid avoidance rule which uses a simple heuristic to allow agents to choose the most appropriate avoidance behavior.

35
In order to compare the original and hybrid rules Section 5 will outline a different experimental setup where many simultaneous agents form two traffic streams that cross at a point. The results from several simulations at varying traffic demands are presented and show that the hybrid rule can significantly improve performance, in terms of transit time, for larger traffic demand levels.
The results and use case implications are then discussed in Section 6 before a brief conclusion is provided 40 in Section 7.

Agent based model for quadcopter UAVs
We consider a system of (for simplicity) identical agents modelled as point masses moving in continuous time and 2D space, with displacements, velocities and accelerations denoted r i , v i , and a i respectively. Inspired by the Social Force Model [13] for pedestrian dynamics, we suppose that each acceleration a i is composed 45 additively of desired contributions a k i , which model e.g., waypoint-following and various pairwise interaction terms respectively, as described below. However, to model each agent's limited capabilities, we suppose there is a maximum amplitude acceleration a max (assumed common to all agents). Furthermore, we assume a max is independent of velocity, as a simple model of a quadcopter drone. If then the magnitude of the sum of all acceleration contributions 50 a i is less than a max it is unchanged. However, if the magnitude of a i exceeds a max then the acceleration experienced by the agent isâ i a max .
The first acceleration contribution for each agent applies at all times and takes the form which models first order relaxation towards a (common) desired cruising speed v CS in directiont i , i.e., an optimal velocity model [27]. In Section 3 and 4,t i is a constant vector, different for each agent, which 55 prescribes their overall desired direction of motion, whereas in Section 5, we prescribê so that agent i flies towards a waypoint at r WP i , which in this paper, corresponds to their final destination. Note that the two models fort i approach each other when the waypoint is very distant from the agent.
Here, for illustration, the rate parameter τ is given by the natural timescale so that an agent flying at the cruising speed v CS in exactly the opposite direction to that required will 60 initially retard itself with the maximum allowed acceleration a max . Figure 1: The velocity obstacle of two converging agents. The dashed circle centred on j is used to construct the velocity obstacle. The relative velocity lies within the velocity obstacle so these two agents are on a 'conflict course'. The red arrows show the unit vectors of the velocity obstacle,r vo A andr vo B respectively.
Next, we consider acceleration contributions that are introduced to help avoid conflicts. The goal is to ensure, that at all times, where S models a safety distance inside which no pair of agents should usually encroach. Thus if at some future time inequality (4) will be violated for a given pair i, j, they are on a 'conflict course' and should maneuver to avoid each other. Ideally, to minimise flight times, the maneuvers should involve the minimum possible detours that respect Eq. (4) unless the two agents have already violated it, then an emergency avoidance acceleration rule is introduced where such that the agent i will accelerate directly away from its neighbor. To achieve nominal conflict avoidance we adopt a scheme based on the velocity obstacle method [20], see At each instant, the ego agent supposes that the alter will continue with constant velocity and thus computes a velocity v j i for itself, that if adopted instantly, would cause the two agents to pass each other at minimum separation S. Unfortunately, due to inertia, the velocity v j i cannot be adopted instantly, so instead we adopt an acceleration contribution which prescribes first order relaxation to v j i . Here the timescale t C is given by the time to conflict, i.e., the 75 time that remains until (4) would be broken, if the agents maintained their current velocities. In this work we also introduce the constraint that in an effort to ensure traffic continues to flow while stopping agents from continually speeding up during avoidance maneuvers. In fact, the alter is applying the same logic so that the computed v j i and v i j would, if adopted, cause the 80 agents to approach at nearest distance 2S. This does not happen in practice, however because, firstly, v j i and v i j are not instantly adopted, secondly they change in time, and thirdly, because if at any point the agents came off 'conflict course', the manouevre would cease to apply. Finally, note that if despite earlier efforts, the agents continue on conflicting courses, the time to conflict t C will reduce and increase the acceleration contribution in an effort to avoid conflict. 85 We now consider how to determine the avoidance velocity v j i . At any instant the ego agent, i, will produce a velocity obstacle induced by the alter, j, which is defined by two lines that originate from i and (a) are tangential to a circle of radius S centred on j, with unit vectorsr vo A andr vo B in the frame of i. If the relative velocity of i and j lies within this cone then (4) will be broken at some point in the future. To prevent this, and minimise the detour i undertakes, we consider possible v j i that produce a new relative that is parallel with either of the velocity obstacle's unit vectors. Given Eq. (7) and the fact that v * ij is parallel tor vo A orr vo B we can find a value for |v * ij | by finding the roots of and a similar equation forr vo B , resulting in four possible roots. The roots can be used with Eq. (8) to find 95 possible values for v j i . We only consider positive real roots when determining v j i since these correspond to maneuvers where agents pass each other at some point in the future. To choose between possible maneuvers, and inspired by the current rules of the air, we prescribe that all agents must turn to their right, which ensures that two agents approaching each other head on do not pick mirrored velocities that lead to subsequent conflicts. If there is more than one v j i that results in a right hand turn then the agent will pick the one 100 with the smallest angular deviation from its current velocity v i . If no viable v j i is found then the agent will employ the emergency acceleration from Eq. 5.

Experiments on pairwise interaction
Here we propose an experimental setup, implemented in simulation, to explore how the rules for agent motion in Section 2 perform for pairs of agents. 105 We suppose that two agents i = 1, 2 are initialised with positions r i (0) = −R iêi and velocities v i (0) = v CSêi , withê 1 = (1, 0) T andê 2 = (cos θ, sin θ) T , see Fig. 2(a). Furthermore, we sett i =ê i so that, in the absence of collision avoidance maneuvers, Eq. (1) implies the agents will continue at constant velocity, and their paths will cross at angle θ at the origin.
Note that the difference ∆R := R 2 − R 1 constitutes a sort-of 'distance-phase' parameter. Clearly, if 110 |∆R| < S, the agents will come into conflict (i.e., inequality (4) is violated at some point along their trajectories) in the neighborhood of the origin, whatever the approach angle θ. In contrast, in the perfect 'head-on' case θ = π, the agents will come in to conflict irrespective of |∆R|. In fact, it may be shown that |∆R| ≤ S sec(θ/2) is a sufficient condition for a 'conflict course', in which case, the conflict avoidance scheme described in Section 2 is activated, and the agents will thus deviate from straight line trajectories for t > 0. 115 Here, we set R 1 and R 2 very large, to model agents approaching each other 'from infinity', so initially the time to conflict t C is large, and the initial corrective motion is small. Subsequently, agents on a 'conflict  course' will avoid each other to the right, approaching at the minimum distance S, and then via Eq. (1), equilibrate to a path that is parallel to their original course, see Fig. 2(b). The net effect of the interaction is to displace each agent laterally and longitudinally with respect to its original trajectory. In effect, the 120 interaction costs each agent a delay T i := d i /v CS , where d i is the respective longitudinal deficit We conduct a set of experiments that sweeps through values of θ in the range (−π, +π), with |θ| ≥ 2S/R 1 so that the agents are not in conflict at t = 0. Correspondingly, ∆R is swept through the range (−3S/2, +3S/2) which captures a variety of settings in which conflicts between the agents occur. In these experiments the other problem parameters, see Table 1, are held at constant dimensional values inspired by 125 previous work such as [8,10,28].
Although at this point agents apply the same rules, for the sake of discussion we shall view i = 1 as the ego agent and i = 2 as the alter agent. Fig. 3 shows the delay T 1 (∆R, θ) experienced by agent i = 1. The geometric symmetry in the setup implies that T 2 (∆R, θ) = T 1 (−∆R, −θ) and thus T 2 may be recovered from the T 1 plot. This means we can view agent i = 2 as the ego and i = 1 as the alter by flipping the signs of 130 ∆R and θ.
The key observation from Fig. 3 is that the problematic situations correspond to small values of |θ|, where the vast majority of the delay is experienced by the agent who is ego with positive θ. Thus a collaborative maneuver is giving rise to an unfair and thus undesirable outcome in the delays experienced. This result is explained in the trajectory plot of Fig. 4. Here, although agent 1 has a 'head start' in the initial setup 135 and therefore might be expected to pass in front of agent 2, it turns to the right in the avoidance maneuver, and thus crosses behind agent 2, incurring delay. In contrast, agent 2 follows an almost perfectly linear trajectory. This effect is most severe as the angle θ tends to zero and ∆R approaches its maximum positive value i.e., when agent 1 is far enough ahead to almost be clear of a conflict course.
This finding suggests that other avoidance rules, for example in which agents can choose an alternative

Experiments with modified rules for pairwise interaction
We now consider adaptations to the conflict avoidance scheme analysed in Section 3, which employed a 'turn to the right' rule that we now denote R A , where agent 1 was shown to experience large delay for small 145 positive approach angles θ. Alternatively, Fig. 5(b) describes a rule R B where agent 1 turns to the left to avoid conflict and Fig. 5(c) describes a rule R C where agent 1 does not employ a conflict avoidance scheme at all, but rather continues in a straight line. In both rules R B and R C , agent 2 continues to implement conflict avoidance by turning to the right, just as in rule R A . 150 See Fig. 6(a), which shows the delay T B 1 (∆R, θ) experienced by agent 1 for rule R B . In this rule, the agents turn away from each other, so that there is no longer left-right handedness in their interaction. Consequently, we may consider θ > 0 without loss of generality, and the delay T B 2 experienced by agent 2 is given by T B 2 (∆R, θ) = T B 1 (−∆R, θ). Compare with Fig. 3(a) and observe that rule R B dramatically reduces the delay for agent 1 for combinations ∆R S and θ 0 for which it was worst for rule R A , at the expense, 155 see Fig. 3(b) of a modest increase in delay for agent 2. Note, however, that rule R B performs badly at large θ values compared to rule R A .
For rule R C agent 1 experiences zero delay while agent 2 experiences similar delays to those of rule R B , compare Fig. 6(b) with Fig. 6(a) reflected in the ∆R = 0 line. Therefore, both R B and R C can be used to reduce the delay incurred by agent 1, but at the cost of additional delay to agent 2. 160 We therefore propose a hybrid rule where agent 1 chooses from rules A, B, or C, broadly according to the following criteria: that is, agent 1 improves its own experience by deviating from the default rule; and • We require so that the reduction in agent 1's delay is greater than any increase in agent 2's delay and so that there is overall net system benefit.
If R B and R C satisfy these conditions at the same time then we suggest that agent 1 should adopt the rule which results in the smallest total delay for the pair of agents. According to this, the areas under the dashed lines in Fig. 6(a) and 6(b) approximate where agent 1 should deviate from R A by adopting R B and 170 R C respectively. In practice, this provides a heuristic which allows agents to choose their avoidance behavior, assuming R A as the default, based on the parameters θ and ∆R. It is possible to apply this heuristic in more general scenarios by taking a pair of agents' current states and recovering the setup from Fig. 2(a) in the ego agent's frame of reference, for example by assuming the point at which their paths cross is based on their instantaneous velocities and finding ∆R with respect to that point. 175

Large scale simulation setup and results
We now perform simulations closer to the real UTM use case, in which large numbers of agents are 'in the air' simultaneously. The idea is to test and compare the original conflict avoidance rule R A with the hybrid rule proposed in Section 4.
In this multi-agent setting, the number of potential conflicts scales quadratically with the number of 180 agents, which is not practically feasible in the application setting. Therefore, we suppose that collision avoidance maneuvers only begin when t C < τ , i.e., when the time-to-conflict is less than the natural time scale introduced in Eq. (3). Thus typically each agent will only be involved in at most a handful of collision avoidance maneuvers at any one time, which are combined by the additive method described in Section 2. In our simulations, there are two unidirectional streams of agents that travel between 'ports' at fixed 185 locations (±L/2, 0) and (0, ±L/2) respectively, see Fig. 7. This setup generates a sort-of crossroads at (0, 0) where we expect agents to encounter each other, in a similar manner as described in Section 3, with |θ| = π/2. Agents are generated at the origin ports (−L/2, 0) and (0, −L/2) with initial velocities v CS (1, 0) T and v CS (0, 1) T , and waypoints (+L/2, 0) and (0, +L/2) respectively, according to independent Poisson processes of the same rate λ. When an agent is generated, it is added to a queue for take-off, which 190 is served deterministically to maintain a minimum spatial separation S takeoff := 3S/2 that ensures agents do not come into conflict in the early stages of their flight. It follows that when λ S takeoff /v CS , most agents take-off at the same instant they are generated, but when λ > S takeoff /v CS , the agents are deterministically spaced.
During their subsequent journey, we expect agents to encounter agents from the crossing traffic stream 195 somewhere in the neighborhood of the origin. In contrast to the controlled experiments described in Section 3, a single agent might become involved in a sequence of such encounters; there are also secondary encounters where an avoidance maneuver places an agent into a 'conflict course' that would not otherwise have occurred. Finally, as agents progress past the crossroads, they realign towards their destination and agents in the same stream may come into conflict with each other as they sort themselves out.

200
As agents converge upon their respective destination 'ports', there is the potential for conflict to arise that could only be resolved by centralised traffic management, that is outside the scope of this paper. Therefore, to simplify matters, we suppose that agents are removed from the simulation when they first enter a 'landing zone' of radius R LZ centred upon their destination. The key metric that we will then analyse is each agent's flight time, from takeoff to entering the landing zone, compared to the time (L − R LZ )/v CS they would have 205 taken if no other agents were present.
Our experiments sweep over a range of traffic demands λ, from low values in which interactions are rare and delay is small, up to which may be shown to be the maximum capacity that the intersection could sustain, without interaction between the agents, if the traffic streams were evenly spaced and perfectly phased. Inspired by the real-world 210 application, we set L = 2, 000 m and R LZ = 60 m L. Each simulation continues until 1,000 agents have taken off from each of the origin ports. Because the first agents to enter the simulation encounter generally smoother transit, we employ standard statistical procedures to discard the simulation run-up, after which the transit times for more than 900 agents from each stream remain for analysis. Since the overall numbers are small, a bootstrap procedure is used to 215 estimate a distribution for the mean delay T for the agents from each stream.
Unsurprisingly, for both avoidance rules used, the mean delay increases along with the total demand, Fig. 8. However, once the demand reaches 40% of λ max the hybrid avoidance rule begins to significantly shorten the mean time delay. While the two traffic streams approach at right angles to each other, and therefore we would not expect the hybrid rule to cause agents to deviate from R A , agents that are displaced from their straight line path may subsequently come in to a conflict course with another agent with which it shares a destination. If this situation occurs the two agents are more likely to have a small angle θ between them and therefore one of the agents is more likely to alter its avoidance behaviour. This effect has a greater impact on the mean delay in scenarios with larger demands since more agents are deviated from their straight line paths and these secondary conflict avoidance maneuvers become more common. 225

Discussion
The results in Section 5 show that even when both streams experience maximum demand the mean delay, when all agents obey R A , is only an extension of about 15% compared with the ideal, linear flgiht path. The hybrid rule provides a small but significant performance boost with the mean extension being closer to 12%. However, while the single crossroad setup is far more complex than the initial two-body setup described in 230 Section 3, we expect the UAV use case of goods delivery to result in even more complex traffic scenarios in which agents may have to pass through multiple such intersections and it is unclear how these may interact with each other. Furthermore the crossroads themselves may be more complex with different angles between streams, counterflows travelling in the opposite direction or an intersection of more than two streams. It should also be noted that both rules perform extremely well at lower demands and we will explore in future 235 work how higher level traffic management techniques can exploit this by affecting system level features such as the traffic layout. Another aspect of this work that is under explored here is the way in which the various acceleration components described in Section 2 are combined additively and how this affects agents' behvaior. The pairwise setup is thoroughly explored, well understood and through the use of the hybrid-rule scheme can 240 resolve potential conflicts with small delays for agents approaching from far away. In contrast, the multiagent scenario is not explored in the same detail and we can provide no guarantees that undesirable edge cases do not exist, e.g., interactions between many agents may lead to deadlocks or driving agents in to obstacles. Future work could explore other ways to combine the acceleration components, for example via a weighted sum, or turning off certain interactions.

245
An obvious point of comparison for any possible flight rules for autonomous agents are the current rightof-way rules for manned aircraft, defined by the International Civil Aviation Organization (ICAO) in Section 3.2.1 of [29]. These rules define three kinds of conflict course for a pair of aircraft; head on, converging, and overtaking, as well as prescribing which aircraft has the right-of-way and can therefore maintain its velocity.
In a head on scenario neither aircraft is given right-of-way and both must turn to their right similar to R A .
In a converging scenario the aircraft with its neighbor on its left is given the right-of-way, unlike in R A which again causes both agents to avoid each other in an attempt to more evenly distribute the negative impact of avoidance. Finally, in an overtaking scenario, which [29] defines as one aircraft approaching another from behind within a certain range of angles, the aircraft being overtaken is given right-of-way while the other must turn to its right. Since in our scenario agents attempt to maintain some constant cruising speed 255 overtaking does not occur in the same way as envisioned in the ICAO rules however the effect of both R B and R C is to stop craft from overtaking one another when the angle of approach is small, in other words to conserve the order in which agents pass through the point where their paths intersect.
We tentatively propose the flight rules outlined in Section 2 be combined with the heuristic method for determining avoidance behavior from Section 4, in order to form the basis of rules-of-the-air for future 260 autonomous UAVs. However, before such a set of rules could be implemented there are a number of potential real-world issues that have not been addressed in the simulation work presented here. This includes the effect that noisy localisation or communication might have on the rules, particularly in high density traffic situations where many agents broadcasting simultaneously can degrade communication performance further. Similarly, this paper only considers a "sunny day" scenario where agents do not experience failures and all of them 265 conform to the rules. It should be pointed out that the rules presented here can not prevent conflicts with malicious agents that actively pursues a conflict course, especially if it is travels faster than v CS .

Conclusion
In this paper we have presented a set of rules that enable agents to navigate towards a destination while maintaining some safe separation with each other. The safe separation is maintained through the implemen-270 tation of a velocity obstacle based method which enables UAVs to modify their velocity. We have shown that a simple 'avoid to the right' rule is effective for a wide range of setups for a pair of agents but that it can produce undesirable behaviour for small angles of approach. As such we presented two other avoidance rules and developed a heuristic based on the relative positions and velocities of a pair of agents to determine which of the three rules should be applied for any given pairwise interaction. We finally show that the resulting 275 hybrid avoidance rule reduces the mean delay experienced by agents that travel through a crossroads setup.

Acknowledgements
This work is funded and delivered in partnership between the Thales Group and the University of Bristol, and with the support of the UK Engineering and Physical Sciences Research Council Grant Award EP/R004757/1 entitled 'Thales-Bristol Partnership in Hybrid Autonomous Systems Engineering (T-B PHASE)'.