Reliable Autonomous Driving Environment Model With Unified State-Extended Boundary

From the early stage of robotic applications to current autonomous driving technologies, environment modeling has been acting as the middleware for connecting perception and decision layers. In robotic applications, space-oriented models (e.g., grid map, drivable area) are widely applied to faithfully reflect the space occupation. With the development of autonomous driving, highly dynamic and complex road environment brings rising need to understand the type and motion status of objects, thus element list has became the mainstream environment model. However, along comes the reliablity problem caused by missed detection and irregular objects, which is still inevitable despite the detection accuracy improvement. In view of this, a new view of driving environment is proposed as the unified state-extended boundary (USEB), aiming to improve the reliablity of element-oriented model. For driving decision requirements, different types of elements are consistently converted into driving constraints. Semantics and dynamics are expressed as the status of drivable area boundary, making it possible to merge space occupation to improve reliability against missed detection and irregular objects. Evaluation of USEB is carried out on the nuScenes dataset. Comparative results show that the proposed USEB could cover the required information for driving decision, whereas achieving higher reliability than the commonly applied element-oriented model.


I. INTRODUCTION
G ENERALLY, environment modeling plays the role of abstracting the real world in autonomous driving. It is through the expression and understanding of the world in the environment model that the vehicle adapts to driving action [1]. With the quick development of autonomous driving, high level autonomous driving (L4-L5 in SAE standard [2]) has become a research hotspot. Compared to robotic applications and Advanced Driver Assistance System (ADAS), high level autonomous driving should have the ability to monitor the environment independently. Thus, due to the highly complex and highly dynamic road environment, an expressive, reliable environment model is increasingly important for making reasonable and optimal driving decision in high level autonomous driving [3]. For better illustration of the world and better support of driving task, researchers have always been trying to find the ideal expressive and reliable model [3], [4], [5], [6]. Generally, the autonomous driving environment model could be divided into two major categories: space-oriented models and element-oriented models. Here, a brief introduction to the developing history of environment models, from robotic applications to autonomous driving, will be presented. Early since the 1980s, grid map is a long applied environment model from the early stage of robot applications [4]. As a type of space-oriented model, the surrounding space is shattered into grids carrying information of environment status. Occupation grid [4], [7], [8] is the basic form of grid map. Input from LiDAR, camera and other sensors are used to calculate the occupancy probability of each grid point, achieving overall representation of the accessibility of the whole environment [7], [9], [10]. For an indoor robot, the driving task could be achieved by path searching in the unoccupied free area. In autonomous driving applications, the occupation grid map is usually simplified into drivable area boundary for decision-making, providing driving constraints in a more concise way. After sampling the drivable area into state-lattice [11], [12], trajectory planning could be achieved by A*, RRT and so on. In all, in space-oriented models, the faithfully illustrated occupation could well support the illustration of irregular and unknown space. However, in high level autonomous driving, since there are various static and dynamic elements such as lane marking, stop line, traffic light, traffic sign, vehicle and pedestrian, more information of the environment should be considered besides occupation.
In 2000s, autonomous driving became a rising research hotspot. At least since DARPA Challenges [13], [14], [15], element-oriented models have been the mainstream in autonomous driving in the complex road environment. In order to make reasonable, legal and optimal driving decision, the system has to semantically and dynamically express those elements. Therefore, by listing the road elements and required properties, the element-oriented model could meet this need and be widely applied. The element-oriented model is widely applied in many driving decision methods of different types and scenarios, e.g., behavior planning [16], trajectory planning [13], junction decision [17], and multi-lane decision [18]. In [19], [20], and [21] with reinforcement learning decision, the state-space is defined with ego and obstacle properties. The construction of element-oriented model has been promoted by the increasing deep learning technology since 2006. Vehicle detection [22], [23], pedestrian detection [24], [25], lane detection [26], [27], [28] etc., along with object tracking [29], could provide the semantic and dynamic information for the element list. The detected objects are usually expressed with 2D or 3D bounding boxes, as in Fast-RCNN [30], YOLO [31], PIXOR [32] and CenterPoint [33], etc. These detection methods greatly prospered the autonomous driving technology from perception to decision, currently, this scheme can cover most of the common driving scenarios [17], [34], [35]. However, missed detection has always been a tricky problem, especially, for the types of objects that are not labeled and trained in the dataset. Although detection accuracy is getting higher and higher, missed detection is still threatening the driving safety especially at a near distance. Also, as for the irregular objects, bounding box detection cannot express them well, limiting the application in off-road scenarios or when unusual irregular obstacles are in the way.
Therefore, it would be better to improve reliability while keeping the semantic and dynamic benefits of the elementoriented model. The high reliability of space-oriented models sheds light on this problem. It would be best to merge the faithful space occupation with the semantic and dynamic information. Schreier et al. [3] proposed the PFS map based on space boundary, further distinguishing dynamic and static boundary sections, but the object dynamics still rely on the object map. Wu et al. [36] proposed MotionNet in 2018, detecting environment semantics and dynamics in grids of bird-eye view, however, background static occupation is not considered, and the grid map data dimension is high. It can be seen from these researches that it is important to promote reliability of element-oriented model with space occupation. However, it remains a problem to cover semantic and dynamic information while accommodating occupation information to promote reliability.
To solve this problem, an environment model called Unified State-Extended Boundary (USEB) is proposed. By analyzing the influencing mechanism of environment elements on the driving decision process, it is found that different types of elements can be uniformly represented in the view of boundary conditions of driving decision. The occupation, semantics and dynamics of the elements in the structural road environments are uniformly expressed as the extended boundary status in the USEB. For the construction of USEB, since the USEB is essentially the driving constraint, it is compatible with different modes of perceptive input. Especially, USEB could support space occupation input to avoid missed detection, and support the expression of bounding boxes as well as irregular objects, thus improving the reliability while keeping the driving required semantic and dynamic information. Fig. 1 shows the above introduced environment models and the proposed USEB. In this example, the grid map could cover the driving requirements like the element-oriented model after involving semantics and dynamics, it still has a large amount of data. Due to the untrained object type, the road barriers are missed in the element-oriented model. The USEB avoided missed detection by applying honest LiDAR-scanned occupation, representing the environment with the form of unified boundary constraints of occupation, semantics and dynamics.
The main contribution of the article is twofold: 1) The proposed USEB has improved reliability than the widely applied element-oriented environment model, while keeping the driving required semantic and dynamic information.
2) The construction of USEB is compatible with different environmental input modes, especially space occupation input, to improve reliability.
The remainder of this article is organised as follows. Section 2 will introduce the USEB model, providing the viewpoint of generalizing different types of environment elements with the driving constraint. In Section 3, the construction of USEB with perception input will be introduced, where two construction methods are taken as example. Section 4 is the experimental result with the nuScenes dataset [37], analyzing the reliability of USEB and other environment models. Finally, Section 5 will conclude this article.

II. ENVIRONMENT MODEL OF USEB
The USEB is an environment model that depicts the occupation, semantics and dynamics of the environment as boundary constraint in the surrounding space. In this section, the basic principle of USEB will be first introduced by analyzing the constraint mechanism, and then the mathematical expression. Finally, it will be analyzed how common road elements are depicted with this form.

A. The Principle of USEB
Generally, autonomous driving process is a constrained optimization problem. Perception and environment modeling sets up the constraint, while the decision module plans the path within the constraint, trying to reach the optimal considering driving task requirement.
In this view, static objects, dynamic objects and virtual rule limit, all influence the driving process by setting constraint on driving decision. This principle basically works through space occupation, i.e., drivable space boundary constraint. In dynamic environment, the motion of dynamic objects creates time-varying constraint. In addition, considering the road environment characteristics, semantic type of the element determines the constraint type, and is important for the vehicle to make the pertinent driving decision.
From the above analysis, it can be seen that the occupation, motion status and semantics can all be generalized as driving constraint. Here, the occupation space boundary is applied as the basis to depict the constraint. First, at each instantaneous moment, static objects, dynamic objects and virtual rule limit can all be depicted as the occupation space boundary. Second, the motion status can be modeled as the contraction/extraction motion of the occupation space. Third, for any given element, its type could be inherited as the type of boundary section that it creates. Thus here in USEB, the unification of the three types of information is defined as the state-extended boundary. In short, the USEB could combine the advantages of both space-oriented models and element-oriented models.
On the one hand, compared to the space-oriented models, the USEB could support the representation of semantics and dynamics in a light-weighted way. With traditional drivable space boundary, only space occupation is represented, omitting the decision required semantics and dynamics. The expressiveness of element-oriented models, including the semantics and dynamics of different traffic elements, helps the USEB to get a better understanding of the traffic environment. Although semantics and dynamics could serve as extended states of the grid map, the high data amount and calculation burden is still a challenge.
On the other hand, compared to the element-oriented models where different types of elements are listed independently, the USEB could unify the traffic elements (i.e., static objects, dynamic objects and virtual rule limit) based on the common essence of boundary constraint. This is because that the USEB has the basic form of boundary, which inherits the consistence of space-oriented model. Also, the USEB supports reliable space occupation input (e.g., occupation grid), which is not supported within the framework of independent detection of element list in the element-oriented models. This results in higher robustness of USEB construction compared to the element-oriented models, which will be further stated in Section III and IV.

B. Mathematical Expression
Based on the above principle, the USEB could be expressed as following: where B denotes the USEB, p i denotes a boundary point. x and y represent the boundary point position (space occupation) , v x and v y represent the motion status of the dynamic properties of the boundary section between point i and i + 1, and η i is the semantic type of this boundary section. In this way, the surrounding constraints are merged onto the space-extended boundary, covering occupation, dynamics and semantics in a unified data structure. With the boundary points p i as vertexes, the boundary is expressed as a polygon consist of boundary sections (polygon edges). It is noted that the dynamics and semantics are regarded as the properties of the edges rather than the vertex. This design follows the human cognitive habits, e.g., the rear of leading vehicle is recognized as an edge rather than its two vertexes.
Considering the characteristics of on-board sensors, the perception result is usually in a ego-centered polar coordinate. In ego-centered polar coordinate, the USEB could be expressed as: where r and θ represent the coordinate value of boundary point position, v and θ v are the speed value of the boundary section.
It should be noted that the above equation is the minimum requirement for driving safety and legality. Semantic information could be expanded according to certain decision requirements, e.g., pedestrian pose, vehicle door status, etc., which could be set as look-up tables.

C. Consistent Expression of Different Environment Elements With USEB
Providing the unified expression of different types of constraint information, the USEB becomes the platform for merging road environment elements through their constraints. Generally, the road environment contains three types of elements, i.e., static elements, dynamic elements and virtual rule limit. In USEB, the constraints derived from the three types of elements will form the state-extended boundary: where s, d, r represent the static objects, dynamic objects and rule information respectively, whereas C O , C S and C D refer to the formed constraints of occupation, semantics and dynamics. The remainder of this section will analyze how static elements, dynamic elements, and virtual rule limits can be expressed in USEB framework. Fig. 2 illustrates how these elements are expressed in USEB. Here a frame in nuScenes dataset is chosen, where static infrastructure, moving obstacle, unknown irregular obstacle and virtual rule limit can all be found in this scenario.

1) Static Elements:
In road environment, static elements refers to unmoving obstacles, i.e., whose space occupation do not change over time. Note that stopped vehicles, pedestrians and cyclists are usually considered as dynamic elements due to the potential of moving. Thus, static elements mainly refer to the static road infrastructure (e.g., roadside, guardrail, etc.) and non-road obstacles (e.g., temporary cones and roadblocks, stones, etc.).
Static elements are usually depicted by the position, shape and type in the element-oriented model. Here from the view of constraints in USEB, the static elements are depicted by the occupation constraint and semantic type: where subscript s marks that this constraint comes from static element.
C O s , C S s and empty C D s can be stacked to get the form of state-extended boundary expressed in (2).
Note that the USEB is actually a 2D bird-view boundary, and the height value z is not considered. No matter how high the obstacle is, the vehicle should not run into the ground region that the static element occupies or blocks. If the obstacle is so low that the vehicle can pass it without collision, it is not considered in the USEB. This principle is widely applied in existing researches, e.g., decision based on the drivable area [11] and decision based on element-oriented model [13].
2) Dynamic Elements: Dynamic elements mainly include vehicles, pedestrians, cyclists, etc. Besides occupation constraint and semantic type, the motion status should be expressed. In the element-oriented model, the dynamic elements are usually expressed by the bounding box: The bounding box can be converted into constraints: where subscript d marks that this constraint comes from dynamic element. Note that irregular objects (i.e., non-rectangular objects) that does not match (8) can also be represented by the constraints. This supports the reliability improvement of USEB facing irregular objects or in off-road scenarios.
3) Virtual Rule Limit: In a road environment, the vehicle should avoid collision and follow the traffic rule. Virtual rule limits, such as solid traffic lane, red light, stop line, prohibiting signs, etc., share the same influencing mechanism as static elements, and thus can be generalized into the occupation and semantic constraints: y 2 ), . . . , (x nr , y nr )} where subscript r marks that this constraint comes from virtual rule limit. In this way, similar strategies of following complex traffic rules are generalized into driving within this homogeneous limit.
From the above analysis, it can be found that static elements, dynamic elements and virtual rule limit can all be generalized as the constraint of occupation, semantics and dynamics, and further stacked into state-extended boundary B in (2).

III. PERCEPTION CONSTRUCTION OF USEB
The problem of USEB construction is, how to obtain the constraints of occupation, semantics and dynamics. USEB construction is not limited to some certain type of perception input, but is compatible with different types of input. Here three classical ways will be taken for instantiation. The roadmap of USEB construction is shown in Fig. 3.
As shown in Fig. 3, USEB construction can take all three topological modes of input in the bird-view 2D plane, i.e., surface-level detection, line-level detection and point-level detection. The compatible detection outputs include object In the remainder of this section, the three methods of constructing USEB listed in Fig. 3 will be adopted. The first method is the fusion of object detection and map limit to form a closed boundary. The second way goes further, enhancing the first method with reliable occupancy detection. The third way is without object detection, fusing the occupation grid, semantics & dynamics grid, and the map limit. Fig. 4 shows USEB construction with fusion of object detection and map limit.

A. Method 1: Fusion of Object Detection and Map Limit
As shown in Fig. 4, with the individually detected bounding boxed of the objects and the map limit, the USEB could be constructed. Object detection provides partial occupation, semantics and dynamics at the same time. However, the whole surrounding constraint still needs to be supplemented by the map, providing remaining occupation and semantic constraints of road infrastructures and virtual rule limit.
The fusion process is carried out in polar coordinate with angular-scattered points. In this way, the constraints from objects and map limits are inspected in the view of general ego-centered environment constraint, i.e., fusion of occupation, semantics and dynamics.
In the first step of operation process, the extended-states of USEB points are initialized by the map limit. In nuScenes dataset, the limit of map infrastructures and virtual rules are unitedly represented (referred to as "road segment" or "road block") as a polygon, as shown in Fig. 4 (b). Thus, the initialization of USEB could be achieved by interpolating on this polygon: where r mθ denotes the map limited distance at direction θ (as shown in Fig. 4 (b)), η m is the semantic label of map limit. Second, the detected objects that lie inside the map limit will update the boundary points corresponding to their angular direction. For any given object: where o denotes the angular range that the object lies in (as shown in Fig. 4 (a)), p θ denotes the kth boundary point. r o denotes the object box distance at direction θ , v o and θ ov are the speed and speed direction of the object, and η o is the object type. The above process is repeatedly executed to iterate all the objects. Following the general logic of sensor fusion in space-oriented models, the angular-dense scattered boundary points provide the platform of fusing of different individual perception input.
Third, remove the interpolated points. Thus, one object bounding box is shown by its visible edges instead of many scattered points.
Since object detection and map limit are observations of different parts of the environment, final constraint in USEB is achieved by combination of both parts. Thus, it can be seen that the information from object detection and map limit, which are the two main sources of element-oriented model, could be well represented by the USEB.

B. Method 2: Enhancing Method 1 With Reliable Occupation
As shown in Fig. 1, missed detection could occur in the element-oriented model due to untrained object type. Thus, if USEB is only constructed with object detection and map limit, missed detection will be exceeded by the USEB. Due to the compatibility with different types of input, the USEB could involve LiDAR detected, reliable space occupation to solve this problem. Fig. 5 shows the enhancement of method 1 with LiDAR generated occupation grid. Here the scenario in Fig. 1 is taken: The fusion of method 1 and occupation grid is carried out in the ego-centered polar coordinate: (r eθ , θ, 0, 0, η e ) , r gθ > r eθ − r th r gθ , θ, 0, 0, 1 , r gθ < r eθ − r th (17) where r eθ denotes the distance of method 1, and r gθ refers to the distance of the nearest occupied grid at direction θ . r th is set as a tolerance threshold to update with the occupation grid. Note that η can be 'background'=0, 'unknown object'=1, 'vehicle'=2, 'pedestrian'=3 and 'cyclist'=4. This look-up table could be custom defined. The definition here is for the convenience of the expression in the following analysis. From Fig. 5, it can be seen that although the semantic type of the barrier is still unknown, i.e., η = 1 shown in black color, the occupation constraint is revealed thus avoiding potential collisions. The boundary of supplemented barriers is not shown as regular bounding boxes but contour curves, indicating that the USEB could be compatible with irregular objects. In a word, USEB could apply the honest occupation grid by LiDAR scanning to improve reliability. Fig. 6 shows USEB construction by fusion with grids and map limit. The semantic and dynamic grids are obtained by BEV grid detection of MotionNet [36] without object bounding-box. Occupation grid map is obtained with reliable laser scanning of the LiDAR.

C. Method 3: Fusion of Grids and Map Limit
As shown in Fig. 6 (d), missed detection of S, D grids could be made up by the occupation grid. Thus, the final grid can cover untrained types of objects and reveal contours of irregular objects.
As for semantics, MotionNet could only observe the nonroad elements, e.g., vehicle, pedestrian, barriers, etc. Thus, although road infrastructures could be revealed in the final grid, the semantic type is unknown. Also, there are some virtual rule limits without laser visible physical entities. With the map limit, this part of occupation and semantics can be supplemented.
For the operation process, the final grid G O S D is obtained by grid-level fusion of MotionNet grids G S D and occupation grids G O . After reprocessing, i.e., generalizing grid resolution and maximum range, the representation of G O and G S D is shown in the following formulae: where n g is the number of grids in one direction, g is the grid resolution. g O , g S D are the state vector of one grid in occupation grids G O and MotionNet grids G S D , respectively. η O is the 0-1 flag marking whether the grid is occupied, whereas η S D could reveal the element type in this grid. The final grid representation and generation is shown in following equations: y, 0, 0, 0), y, 0, 0, 1),  (21). From the above fusion process and Fig. 6 (b), it can be seen that while MotionNet grids recognize some static constraints as 'background', the occupation grids G O supplement this part of occupation as 'unknown'. The final grid can cover occupation, semantics and dynamics and thus is a complete expression. However, the final grid G O S D has large data volume. Following the idea of converting the occupation grid into a drivable area boundary [38], [39], the final grid G O S D can be simplified into the state-extended boundary form. For any boundary point at direction θ : i near , j near = arg min where g = [θ gmin , θ gmax ) denotes the angular range of grid (i, j ), and (25) finds the the nearest grid (i near , j near ) in direction θ that is occupied. (26) converts this grid into polar coordinate, then updates the boundary point in USEB at direction θ . Similar to USEB construction from object detection, the boundary types of sections created by the road infrastructures remain unknown, calling for the fusion with map. The fusion of the grid-based boundary and map limit boundary could be achieved in the polar coordinate, at direction θ : : r gθ > r mθ ∨ r gθ ≈ r mθ ∧ η g = 0 case2 : r gθ < r mθ ∨ r gθ ≈ r mθ ∧ η g = 0 (27) where r mθ denotes the distance of map limit at direction θ , η m is the semantic flag of map, which is set to 5 in USEB. Similarly, r gθ denotes the distance of boundary from final grid defined in (26), and η g is the corresponding semantic flag.
Eq. (27) shows the fusing strategy. If map limit is nearer, it is suggested there is virtual rule limit or undetected low boundary (e.g., undetected sidewalk) that needs to be set as map limit. If grid boundary in (26) is nearer, it is recognized as non-map obstacle (e.g., vehicle). For the grid boundary sections that coincides with map limit in a given threshold, i.e., '≈' established, map limit could supplement to the 'unknown' semantic type of grid boundary in (26).
For final USEB output, it is better for driving decision to recognize one obstacle as a whole, rather than directly outputting angular-dense result. To achieve this, point merging is carried out as follows, similar as that of method 1: Finally, the output of USEB constructed with grids and map limit is shown in Fig. 6 (f). It shows that the USEB can cover occupation, semantics and dynamics at the same time. Due to the support of reliable space detection in construction and expression, the reliability is ensured. Fig. 7 shows the results of method 2 and method 3 in the scenario of Fig. 1. As shown in Fig. 7, after enhancing with honest LiDAR-based occupation detection to avoid the miss detection of barriers, method 2 and method 3 get similar results. This further proves that the USEB is compatible with different types of perception input.

IV. EXPERIMENT RESULTS
In this section, the proposed USEB is tested in the nuScenes dataset. Element-oriented model is taken as the benchmark for comparative analysis. Due to the compatibility to different types of perception input, the two construction methods in Section III will both take part in this analysis. Corresponding to the contributions of USEB, reliability will be taken as comparative criteria, examining whether the USEB could take the advantage of space occupation input to improve reliability against missed detection and irregular objects.
The remainder of this section is composed of three parts. First, statistical study will be carried out in validation partition of nuScenes dataset. Second, several cases will be taken to comparably illustrate the performance of USEB and benchmark in reliability. Third, a set of continuous scenarios will

A. Statistical Study
Here 5126 scene frames in nuScenes validation partition are taken to examine whether USEB could efficiently improve reliability compared to the element-oriented model based on object detection. The O, S, D grid map (i.e., the final grid in Fig. 6) is also taken as a reference.
Reliability could be measured with recall, while precision is taken as reference. With general definition: where T P, F N and F P denote true positive, false negative and false positive, respectively. Measuring T P is achieved by setting geometric threshold on position, following nuScenes true positive metric threshold of 2m. Ground truth GT is taken from nuScenes annotation. Table I shows the statistical results. Here the directly visible constraints inside map limit are considered in reliability measurement. In addition, due to detection range limit, reliability measurement is carried out in the range of 35m around ego vehicle.
From Table I, it can be seen that with similar precision, USEB has improved recall than the widely applied elementoriented model. Due to the limit of training and dataset size, the miss detection of uncommon objects is unavoidable in object detection. With reliable LiDAR scan, the surrounding objects could be revealed despite that some semantic information are missed. With the enhancement by reliable space occupation detection, the USEB could represent objects missed by object detection, as well as irregular objects. In the case study of Section IV(B), this will be further illustrated and analyzed.
In addition, as analyzed in Section III(C), with occupation enhancement on method 1 (i.e., constructed with object detection and map limit), USEB construction method 2 has similar performance in reliability with method 3. This proves that the USEB is compatible with different types of perception input.

B. Comparative Case Study
Here 10 snapshots of scenarios are taken to examine the performance of USEB in nuScenes dataset. LiDAR-based object detection and grid map generation are applied as perception input, and applied "road_segment" and "road_block" as vectorized map input, representing the drivable area as a polygon composed of multiple vertex nodes. Centerpoint [33] is taken for object detection, which is a high-ranking state-ofart method in nuScenes object detection task. MotionNet [36] is applied to participate in grid fusion of USEB construction method 3. Due to the close performance of USEB constructed with method 2 and method 3 (as analyzed in Section III(C)), only one of them is listed in each scenario.
(a)-(d) are junction scenarios: Miss detection occurs in the element-oriented model in (a) and (b). In (a), road barriers and cones are temporarily set at the junction, as shown by the highlighting red circle. Since object detection network has only trained for vehicles, cyclists and pedestrian, these barriers and cones are missed.
The O, S, D grids could cover driving requirements, however, it would be better if the data are more compact. Due to the compatibility of applying honest space occupation from LiDAR scanning, USEB successfully revealed the barriers and cones. The irregular road barrier not rectangular as bounding box (as shown in the original LiDAR pointcloud) is also shown as black color. Similarly, in (b), the barriers in the red circle are shown as black color in O, S, D grids and USEB, while missed in the element-oriented model. In all, examples (a) and (b) could prove that the USEB is able to improve reliability with occupation input, and is compatible with irregular objects.
In scenarios (c) and (d), Centerpoint successfully detected all the obstacles, with occupation O, semantics S and dynamics D. It can be seen that USEB can combine these information and the map limit, and generalize them into direct constraints to ego vehicle. This proves that when there is no missed detection, the USEB could cover driving required O, S and D information with the form of constraints with object detection as input.
(e)-(h) are in-lane scenarios: Miss detection occurs in the element-oriented model in (e) and (f). The example of in-lane scenario (e) shows that, the USEB could apply honest LiDAR scanned occupation to improve reliability compared to object detection in the multilane scenario. The road barriers highlighted by the red circle are shown in black color in USEB constructed with method 3.
In scenario (f), a big chunk is placed at the roadside (shown by the red circle), which is missed by object detection due to its untrained label type. In short, like in junction scenarios (a) and (b), the USEB could use occupation information to improve reliability in multi-lane scenarios.
In scenarios (g) and (h), ego vehicle is about to stop to wait for the traffic light before the junction. In USEB, the leading and following vehicles in ego lane and neighbor lanes are all revealed on the boundary. Different from raw drivable area boundary, the semantic and dynamic information in USEB could support lane-following and lane-changing decision like in the element-oriented model.
Two off-road scenarios (i) and (j) are also examined: These two scenarios are chosen in a parking lot. In (j) where no miss detection and incomplete map input, the elementoriented model, O, S, D grids and the USEB all worked well. However, in (i), it could be seen in the camera image that 3 greening shrubs form a part of static boundary. Potentially due to it is an off-road scenario, the map of nuScenes did not mark them. Thus, although the surrounding vehicles are all detected by Centerpoint, the constraint of element-oriented model is not complete and might lead to collision. In USEB, the LiDAR-scanned occupation could reveal them as unknown static boundary sections, thus providing more reliable performance.
Finally, the above experimental results can be summarized. For the major contribution to the reliability, the USEB shows improved reliability in statistics without decreasing precision compared to the element-oriented model based on object detection. Case studies illustrated this more intuitively, showing that this reliability improvement is achieved through the compatibility with reliable LiDAR-scanned occupation, filling the gap of object types lack of training.
The compatibility with LiDAR-scanned occupation is the key point of the USEB, thanks to part of the USEB's characteristics of space-oriented models. It should be noted that the element-oriented model cannot take grid map as input. Using cluster algorithms to get type-unknown, arbitraryshaped objects (including uncommon objects, long walls, etc.), and putting them into the element list, the concept of expanded element-oriented model will be covered by the concept of USEB. The rule limit, the regular bounding boxes of detected objects (e.g. vehicles, pedestrians) and the irregular unknown objects, will form a continuous boundary constraint in all directions of the ego vehicle. Still, the expanded element-oriented model lacks the unified expression of USEB. In other words, trying to reform the element-oriented model to cover reliable grid map input will lead to the USEB in another way.
Some minor conclusions could also be derived from the results. Compared to grid map with O, S, and D, the USEB provides a more compact expression form of constraint, serving for optimization problem of driving decision. In perception studies with grid-based environment representation, there is usually a 20cm grid resolution [7], [40], [41]. With the boundary representation of occupation, semantics and dynamics, the data amount could be significantly reduced. Last but not least, for perception compatibility, the USEB shows similar performance constructed from methods 2 and 3. This is due to the common essence of environment element constraint.

C. Continuous Case Study
The above case study is based on the snapshot of the traffic scenario. To better illustrate the performance of USEB construction in continuous driving, a series of traffic scenarios is taken in nuScenes dataset. Fig. 10 shows the constructed USEB and the performance curves in 7 continuous frames, with 4 key frames illustrated. Since method 1 relies on object detection where miss detection occurs, the result of method 1 is not listed in Fig. 10, whereas enhanced method 2 along with method 3 are illustrated. In the curves, method 2 is shown with red line marked with "+", and method 3 with blue line marked with "o".
As shown in Fig. 10, the precision and recall of both construction methods keeps above 90% in the continuous frames. This proves that the reliable occupation grid could ensure the high robustness of environment modeling by USEB. Although the semantics (the boundary color, in row 2 from Centerpoint and row 3 from MotionNet) are not totally the same, the sharing reliable occupation grid could ensure the robustness reflected by the recall. Also, the two construction results are similar according to the illustration. This further proves the compatibility of USEB on different perception framework in continuous driving.

V. CONCLUSION
In this paper, a novel environment model called USEB is proposed, covering occupation O, semantics S and dynamics D with the form of boundary constraint. Due to the common essence of environment elements of driving constraints, O, S and D of different types of elements could be generalized into the state-extended boundary of USEB. The USEB is compatible with different types of input, including grid map, object bounding box detection, map limit and so on. Thus, the USEB is helpful for solving missed detection problem in the element-oriented model caused by untrained types and irregular objects. This is achieved by absorbing honest LiDAR scanned occupation information. Experimental results show that the USEB has improved reliability and could be applied in different scenarios, whereas covering additional information of semantics and dynamics for supporting autonomous driving.
The USEB developed in this paper is based on heterogeneous information fusion in polar coordinate. All three geometric forms of surface-level, line-level and point-level information are covered in 2D bird-eye view. In this paper, a basic fusion strategy is applied to establish the compatibility of heterogeneous input. Future work will be focused on improved sensor fusion system, considering data confidence and uncertainties, data matching and conflict data processing. Furthermore, the fusion of heterogeneous sensors sheds light on the performance evaluation of the entire perception system in autonomous driving. Xinyu Jiao received the bachelor's and Ph.D. degrees in automotive engineering from Tsinghua University, China, in 2017 and 2022, respectively. His research interests include environment modeling, driving decision-making, and risk assessment of autonomous vehicles.