Elevation Model Mosaicking From Coregistration, Adjustment, and Median of Stacks (CAMS)

A large and growing volume of repeat digital elevation models (DEMs) obtained from multiple spaceborne sensors enables high-resolution mapping of the Earth’s surface at continental scales. Mosaicking of individual DEMs to form a continuous surface is challenging due to variability data quality and positional accuracy, all of which can result in artifacts. Presented is a method for efficiently mosaicking sets of repeat, overlapping DEMs using their pairwise, translational offsets to remove poor quality DEMs and optimize their alignment prior to merging. The coregistration, adjustment, and median of stacks (CAMS) approach is tested by mosaicking a set of 2-m resolution DEMs created from WorldView (WV) stereoscopic imagery and comparing the result to light detection and ranging (LiDAR) data. CAMS produces a mosaic of substantially higher quality and accuracy than that obtained from the median of all overlapping DEMS, as commonly performed for mosaicking satellite-derived DEMs. The method requires no sensor-specific information or ground control, making it applicable for large-area mosaic production using multiple datasets.

Spaceborne remote sensing provides near global, repeat DEM coverage, enhancing measurement quality and enabling change detection over large areas.However, since these DEMs are collected as swaths along orbital passes, the usability of these data is greatly increased through the continuous coverage provided by mosaicking.
Global-scale mosaicking using satellite data, such as GDEM [1] or WorldDem [2], uses the mean or median height of repeat DEMs.This reduces noise and random vertical errors, but is sensitive to errors in alignment (i.e.registration error) between successive DEMs, with the resulting error increasing with terrain relief.This can be mitigated by registering each DEM to a common set of ground control, as is the case for all existing global DEM datasets.However, such control is not always available and may be degraded by changes in the surface through time.
Commercial satellite imagery offers the capability to produce repeat DEMs, created from stereophotogrammetry, at very high resolution (decimeter to meter scale) over large areas [3,4,5].Frequent repeats provided by these data potential enables ground measurements in areas of persistent clouds.Since these images are collected over relatively narrow swaths (< 20 km), the resulting individual DEMs tend to have small footprints and are therefore less likely to have ground control within their footprint.Each of these DEMs may have registration errors due to uncertainties in the sensor models used to estimate the map coordinates of each pixel that are up to several times larger than the pixel size.Further, hundreds or thousands of individual DEMs may be mosaicked to cover the region of interest.Often, there will be no reference dataset at comparable resolution available to reference and filter these DEMs.
Here we present a fully automated method for mosaicking repeat DEMs that minimize registration errors and noise for data available.The coregistration, Adjustment and Median of Stacks (CAMS) approach uses the three-dimensional offsets between each DEM and all other DEMs in the set of repeat DEMs (i.e., the stack) for adjustment and filtering, providing an optimized median height at each pixel in the mosaic.

A. Dataset Requirements
The dataset consists of single acquisition DEMs, i.e., single scenes collected along an orbital pass, that are to be mosaicked over a rectangular region of interest (i.e., the output mosaic grid).The algorithm as described assumes that all scenes are geo-registered and in the same cartesian coordinate system and with the same pixel resolution as the output mosaic grid.The standard deviation uncertainty in the registration of the scene DEMs must be known or assumed.

B. Coregistration
We first define an output mosaic grid by extent and resolution.All individual DEM scenes with footprints intersecting the mosaic grid, i.e., overlapping scenes, are identified and loaded into memory.If no scene DEMs overlap, the process is terminated.If only one scene DEM overlaps, it is either added directly to the grid, or coregistered to an existing, reference DEM, potentially with filtering, before being added to the grid.
We assume each scene has registration errors in the two horizontal, x, y, and one vertical, z, direction, and are, therefore, offset from each other by the sum of their individual errors.For N scenes, there will be N!/2!(N-2)!pairwise offsets,  = [x, y, z ] in each direction, x, y, z.For example, four overlapping DEM scenes labeled A, B, C, D will yield six unique pairwise offsets, i,j for i≠j = A,B,C,D (Fig. 1).! The offsets are determined through pairwise coregistration.Several algorithms exist for automated DEM coregistration, including Iterative Closest Point [6], surface shape matching with geometric constraints [7], or iterative regressions of elevation differences on aspect [8,9].A requirement is that the 1-sigma uncertainty of each offset, i,j, not just the coregistration residual, is provided to be used as weights in the adjustment.Spatially varying errors, such as low-frequency oscillations due to uncompensated satellite motion, or errors in the sensor model, are not accounted for by translational registration.These errors may be mitigated by breaking up the output grid into smaller sub-grids with an extent that is small relative to the spatial scale of these errors.Sub-grid processing is also effective for parallelization and reducing memory requirements.

C. Filtering
A benefit of the coregistration process is that it yields statistics that are effective in identifying and filtering outlier scenes.In a set of N overlapping scene DEMs, pairwise coregistration between each scene and all the other scenes results in N-1 set of coregistration statistics for each scene, including the mean, median and standard deviation of the surface height residuals between the coregistered scene DEMs.A poor-quality DEM will result in consistently large absolute values for these metrics, whereas pairs of higher quality DEMs will result in values closer to zero.Therefore, we can filter scenes for which the minimum value of the set of residual statistics is above some threshold value.In addition to scenes, individual pairwise offset with magnitudes and uncertainties larger than a threshold may be filtered, so that they are not used in the adjustment (Section 2.4).A tradeoff, therefore, may exist between higher minimum data quality, as enforced by lower thresholds, and data coverage.This is because stricter filtering of scenes may result in data gaps, while lower quality may be preferable to incomplete coverage.Thus, an adaptive approach may be used where filter thresholds are set initially to a minimum.If insufficient scenes remain to cover the grid, the thresholds are incrementally increased until either complete coverage is obtained, or a maximum number of allowable iterations is reached.This process ensures that only the best data, as defined by coregistration residuals, are used in the mosaicking process while enabling complete coverage of the DEM.If only one or two scenes exist, their quality can be ranked through comparison to a reference DEM.Since the reference DEM is only used to rank relative quality and not to filter DEMs based on a threshold condition, or to fill voids, a lower resolution DEM than the mosaic grid may be used.

D. Adjustment
A Our objective is to adjust each scene in three dimensions (i.e.translation) to minimize their offsets so that each pixel on the output grid is aligned with the corresponding pixels of each scene.If a reference DEM exists, all scenes may be coregistered to that reference, reducing the offsets from the reference, and each other, to zero.However, it is often the case where no existing DEM of high enough resolution or quality exists, or that the surface has changed.In this case we assume that each scene is offset from the true surface by a length  = [ x, y, z ] within its uncertainty, so that i=0 ∓i for each ith scene (Fig. 1).We may then express each pairwise offset as the difference in each scene offsets from true: i-j =i,j ∓i,j for i≠j.Continuing with the above example, we then have four scene offsets and six pairwise offsets: The optimal  for each DEM is estimated from the weighted least squares solution, where W is the weight matrix, consisting of each 1/i or 1/ij, where  is the standard deviation of the heights and height differences, respectively.Generally, in the absence of IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL.X, NO.X, MONTH YEAR precise ground control, ij < i , resulting in more weight on i,j .
The set of adjustment offsets, x, y, z, are determined independently in each direction.If  is equal for each scene, the adjustment will be the mean of the stack, whereas lower values of  will bias the adjustment toward those scenes.Thus, ground control from, e.g., high accuracy lidar or GPS can be included as a scene in the stack, potentially with only offsets in the vertical direction if 3-dimensional coregistration cannot be performed.If registration errors are random, with zero bias, the adjustment will converge on the actual height with increasing N.

E. Stack Filtering and Median
Following adjustment, the stack of aligned N DEM scenes provides N repeat measurements at each pixel.Assuming the true surface elevation has not changed, the deviation in the set of N height measurements at the pixel should be zero, with their residuals providing errors on which to filter.Many possible filters may be applied, with the effectiveness mostly dependent on N.
One approach, that is like the coregistration filter described above, locates clusters and outliers of pairwisedifferences in pixel heights in the stack.Differences between more accurate heights should be smaller than less accurate heights, with clusters of pairwise differences around zero.A density-based point cluster algorithm [10] is then used to isolate the clusters from the outliers, which are removed from the stack (i.e., set to no data).
Finally, the median height of each output grid point is calculated from the stack of adjusted heights to obtain the mosaicked DEM.Other potentially useful variables may be obtained from the stack, including the stack depth at each grid point, the median absolute value of the stack heights, which indicates noise, the minimum and maximum heights, or percentiles of heights, and scene acquisition time information.If mosaicking was performed on subtitles, those are then merged, minimizing the offsets between subtitle edges, and applying an edge-distance -weighted average over the buffers.

III. DEMONSTRATION
To demonstrate the basic application of CAMs, we construct a mosaic from a set of stereophotogrammetric DEMs created from MAXAR's constellation of three WorldView (WV) optical sensors and compare the result to a high-accuracy DEM obtained from airborne LiDAR.
We obtain DEMs that overlap a 240 km2 airborne LiDAR survey at Nome Creek, central Alaska [11].This area has relatively sparse boreal vegetation, mostly restricted to river valleys, enabling comparison between the WorldView-derived surface and the bare- ground terrain model provided by the LiDAR.The Nome Creek LiDAR DEM is distributed by the USGS in 1/9-arc second posting in geographic coordinates (NAD83 2011) and heights above the NAVD 88 datum.A one standard deviation uncertainty of +/-30 cm is reported for bare ground.For comparison to the WV DEMs, the LiDAR DEM was reprojected to the ArcticDEM Polar Stereographic (EPSG:3413) projection with heights above the ellipsoid and a posting of 2m (Fig. 2a).To register the WV DEMs to the Lidar DEMs we apply a land classification mask derived from the 10-m European Space Agency WorldCover product [12] , only using pixels classified as grassland, bare ground or moss and lichen.These comprise 79% of the LiDAR DEM area.
Twenty-seven WV DEMs were acquired between 30 April 2011 and 14 May 2020 over the Nome Creek LiDAR survey area (Table S1).Twenty-five of these were obtained from stereo image pairs acquired along the same orbital pass (i.e.in-track), while two were obtained from images acquired during different passes (i.e.crosstrack).The 2-m posting DEM's were created using SETSM software package with the standard ArcticDEM processing system [13,14].On average, each grid point in the output mosaic has 10 repeat DEMs, ranging from 7 to 14 repeats within the footprint of the LiDAR DEM (Fig. 2b).
As a first check of data quality, each WV DEM is differenced from the LiDAR DEM and the mean and standard deviations of the vertical differences over grassland and bare ground are obtained (Table S1).Six of the WV DEMs do not overlap with the LiDAR footprint.For the 21 WV DEMs with overlap, the mean vertical difference ranges from -4 to 55 m, averaging 2.42 m, while the standard deviation ranges from 0.70 to 152 m, averaging 13.56 m.WV DEMs #10, #13 and #24 appear as outliers with the largest mean and/or standard deviations of vertical differences.Examination of these WV DEMs reveals extensive artifacts due to clouds.
To assess the CAMS mosaic result, a mosaic is first constructed from only the median of all overlapping DEM pixels without adjustment or filtering (i.e., a median-only DEM).We coregister this mosaic to the LiDAR DEM using the method in [8,9] for finding the vertical, z, and horizontal (x and y) shifts that minimize the root-mean-square of vertical differences between the land class-masked WV and LiDAR DEMs.Shifting the WV medianonly DEM by -0.19 m in the vertical and -0.82 m and 3.81 m respectively in the x and y directions reduces the root-mean-square (RMS) of the vertical differences over grasslands and bare ground by 22%, from 68 to 53 cm.Differencing the registered medianonly mosaic from the LiDAR DEM shows the expected large positive differences over areas of dense vegetation along rivers and within drainage gullies, but also distinct artifacts at the edges of adjoining strips (Fig. 3).There is also an overall trend from positive to negative differences along y.Both this trend and large deviations are visible along the profile shown in Fig. 4.
A mosaic is then constructed using the CAMS approach by, first, coregistering each pair of overlapping WV DEMs to obtain the set of pairwise offsets, ij, and the statistics of the coregistration.The strips are then filtered based on their minimum values for the standard deviations and absolute means of their pairwise vertical differences following coregistration.These values are shown in Table S1.Threshold values of 4 m and 0.1 m, respectively, are applied.Based on these thresholds, strips #2, #10 and #13 are not used in the mosaic.While strip #2 is both of poor quality and only intersects a small portion of the study area, DEMs #10 and #13, which are identified as outliers above, have extensive artifacts due to cloud cover.Despite having a large mean vertical difference and standard deviation from the reference, scene #24 is not filtered because its pair-wise offsets do not reach the thresholds.Removal of DEMs #10 and #13 reduces the mean of the mean vertical differences from the LiDAR DEM from 2.42 m to 0.25 m and their standard deviation from 13.56 to 4.40 m.
The adjustments, , are then calculated from the weighted least squares of the matrix of offsets using (3).Additionally, pairwise offset values of greater than 50 m, or offset uncertainty values greater than 0.1 m, are not used in the adjustment.Of 183 pairwise offsets, none of the vertical offsets met this threshold, but 23 of the x-direction and 21 of the y-direction offsets met this threshold and were not included in the adjustment.Regression weights are the inverse of the one sigma error in each i,j and 4 m for every i.The resulting offset values are given in Table S1.Offsets in each direction ranged by approximately 10 meters, with standard deviations of offsets along x, y and z of 2.73 m, 3.17 m, and 2.01 m, respectively.Differencing the CAMS mosaic from the LiDAR DEM shows a substantial reduction in vertical differences and artifacts (Fig. 3).Registration reduces the RMS of the vertical differences from 58 cm to 27 cm, or from 15% less than the median-only mosaic to 49% less.Similarly, prior to coregistration, the LE68 and LE90 values of 57 and 91 cm are 7% and 17% respectively less than those of the median-only mosaic.Following registration, these are reduced to 24 and 42 cm, or by 47% and 52% of the median-only mosaic values, respectively.The elevation difference profile also shows a distinct decrease in the magnitude of deviation, as well as the removal of the trend (Fig. 4).Notably, the resulting difference between the CAMS mosaic and the LiDAR DEMs are within the expected uncertainty of the LiDAR DEM.

V. CONCLUSION
A robust, adaptive, and general method is presented for mosaicking repeat, overlapping Digital Elevation Models (DEMs) for the purpose of obtaining a continuous, high-quality surface over large area.The coregistration, Adjustment and Median of Stacks (CAMS) method minimizes the translational offsets between overlapping "stacks" of repeat DEMs using weighted least squares and uses the information provided by those offsets to filter the dataset prior to obtaining the median value at each grid point.A basic application is provided in which mosaics are created from a set of overlapping stereophotogrammetric WorldView (WV) DEMs using both a simple median and the CAMS approach.Both are compared to a LiDAR-derived DEM, revealing that CAMS is effective at reducing artifacts caused by bad DEMs and offsets between DEM scenes.Following coregistration, the root-mean-square (RMS) of the vertical differences between the CAMS DEM and IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL.X, NO.X, MONTH YEAR the LiDAR DEM is approximately half than those for the median-only mosaic, as are the 68th and 90th percentile linear errors, resulting in an uncertainty within that expected for the LiDAR DEM over areas of sparse vegetation.
Several improvements can be made to the basic CAMS approach outlined here.First, where available, high-accuracy ground control points, either from GPS surveys or from air or spaceborne altimetry, can be added to the coregistration and adjustment processes to produce a registered mosaic.Second, filters can be applied to the series of repeat measurements on a point-by-point basis to remove outliers and further reduce noise in the result.Third, by mosaicking small subset areas (e.g., 1 km by 1km) of the full region of interest and then merging the subset results, spatially varying biases in the DEMs can be reduced and mosaics over very large areas, using many repeats, can be produced through parallel computing.This is the approach taken for the continental-scale, high-resolution surface mapping projects ArcticDEM, EarthDEM and the Reference Elevation Model of Antarctica (REMA).
errain and surface maps provided by Digital Elevation Models (DEMs) are essential for a wide range of geoscience, navigation and engineering activities.The sources and volumes of DEMs have proliferated over the past decade, and include Light Detection and Ranging (LiDAR), stereophotogrammetry and Interferometric Synthetic Aperture Radar from both air and spaceborne sensors.

Fig. 1 .
Fig. 1.Schematic of four overlapping DEM scenes (A, B, C, D), the pairwise offsets between each scene,  and the adjustments .!
A-C = A,C ∓A,C A-D = A,D ∓A,D B-C = B,C ∓B,C C-D = C,D ∓C,D B-D = B,D ∓B,D.As a linear system in matrix form this is = X ,(2) where,  = [ 0 0 0 0 A,B A,C A,D B,C B,D C,D ] T ,

Fig. 2 .
Fig. 2. (a) Hill shade image of the Nome Creek LiDAR DEM [11].Terrain is exaggerated by a factor of four.(b) Color scale map of numbers of repeat WorldView DEM strips over the Nome Creek survey area.White lines are the footprint of the LiDAR DEM.Projection is polar stereographic EPSG 3413 with coordinates in kilometers.

Fig. 3 .
Fig. 3. (a) Color scale maps of the difference, in meters, between the (A) median-only and (b) CAMS DEM mosaics and the LiDAR DEM.White line in (a) is the profile in Fig. 4.

Fig. 4 .
Fig. 4. Difference from the LiDAR DEM for the (blue) medianonly and (red) CAMS mosaics along the profile P-P' shown in Fig. 3(a).
This work was supported by the U.S. National Science Foundation Office of Polar Programs under grants 1559691 and 1543501 and the U.S. National Aeronautics and Space Administration under grant 80NSSC18M0078.I. M. Howat is with the is with the Byrd Polar and Climate Research Center and the School of Earth Sciences, The Ohio State University, Columbus, OH 43210 USA (e-mail: ihowat@gmail.com).