Combining Feature Correspondence With Parametric Chamfer Alignment: Hybrid Two-Stage Registration for Ultra-Widefield Retinal Images

We propose a novel hybrid framework for registering retinal images in the presence of extreme geometric distortions that are commonly encountered in ultra-widefield (UWF) fluorescein angiography. Our approach consists of two stages: a feature-based global registration and a vessel-based local refinement. For the global registration, we introduce a modified RANSAC (random sample and consensus) that jointly identifies robust matches between feature keypoints in reference and target images and estimates a polynomial geometric transformation consistent with the identified correspondences. Our RANSAC modification particularly improves feature point matching and the registration in peripheral regions that are most severely impacted by the geometric distortions. The second local refinement stage is formulated in our framework as a parametric chamfer alignment for vessel maps obtained using a deep neural network. Because the complete vessel maps contribute to the chamfer alignment, this approach not only improves registration accuracy but also aligns with clinical practice, where vessels are typically a key focus of examinations. We validate the effectiveness of the proposed framework on a new UWF fluorescein angiography (FA) dataset and on the existing narrow-field FIRE (fundus image registration) dataset and demonstrate that it significantly outperforms prior retinal image registration methods in accuracy. The proposed approach enhances the utility of large sets of longitudinal UWF images by enabling: (a) automatic computation of vessel change metrics such as vessel density and caliber, and (b) standardized and co-registered examination that can better highlight changes of clinical interest to physicians.


narrow-field FIRE (fundus image registration) dataset and demonstrate that it significantly outperforms prior retinal image registration methods in accuracy. The proposed approach enhances the utility of large sets of longitudinal UWF images by enabling: (a) automatic computation of vessel change metrics such as vessel density and caliber, and (b) standardized and co-registered examination that can better highlight changes of clinical interest to physicians.
Index Terms-Image registration, vessel detection, fluorescein angiography, retinal image analysis, RANSAC.

I. INTRODUCTION
R ETINAL image registration is one of the crucial tasks in ophthalmological image analysis that facilitates several clinical applications. Registration of multiple narrow-field color fundus images captured during a clinical visit has been used to produce a montage with large field-of-view (FOV) [1]. Registration of retinal images captured over a series of longitudinal visits has also been used for quantitative and qualitative assessment of temporal changes in vasculature [2]- [4].
The goal of automated retinal image registration is to determine a geometric transformation between the spatial coordinates of images captured from different viewpoints so as to align their content. The problem has been studied extensively and existing works can be broadly grouped into three categories: intensitybased, keypoint-based, and segmentation-based. Intensity-based approaches estimate the registration transform by maximizing a similarity metric between the image pairs under the registration transform. Various metrics have been proposed, including mutual information [5], [6], mean squared difference [7], crosscorrelation [8], and phase correlation [9]. Keypoint-based methods estimate the registration transform from correspondences of feature keypoints extracted from the retinal images, where RANSAC [10] is typically used to establish correspondences robustly. Both hand-crafted feature detectors and deep-learning based feature detectors have been used for retinal image registration [11]- [14]. Methods in the third category, detect binary vessel maps from the retinal images, which are then used to estimate the registration transform via a number of alternative methodologies. Specifically, the problem has been formulated as deformable line-based registration for skeletonized vessel maps [15], as two-dimensional registration of vessel pixel point clouds [16], and as topological matching of the vessel connectivity structure represented as a graph [17], [18]. In addition to 2D transformations between image coordinates, a few methods have also formulated the registration problem as joint estimation of camera poses and 3D retinal structure [19], [20]. An interative framework has also been recently proposed [21], which leverages an available deep-learning based vessel detector for one imaging modality to jointly perform cross-modal registration and weakly-supervised vessel detection for another retinal imaging modality.
Most of the prior works are designed for narrow-field retinal images, which constitute the predominant type of retinal imagery. With advances in imaging techniques, ultra-widefield (UWF) retinal images, such as UWF fluorescein angiography (FA) and UWF fundus photography (FP), are also commonly being utilized for clinical assessment [22]- [24]. Compared with narrow-field retinal images that provide a FOV of 30 • to 50 • , UWF images capture a substantially larger FOV (up to 200 • ), typically covering the retinal surface from the posterior pole to the periphery in a single image [25]. Significant variation and geometric distortion depending on the direction of capture are inevitably encountered for the larger FOV. While the geometric distortions cannot be completely eliminated in mapping from the nearly spherical retina to a planar image, a standardized representation stereographic projection [26], [27] has been developed that also combines information across multiple images in a single montage. Fig. 1 illustrates the stereographic projection process and highlights the extreme geometric distortions seen in individual capture UWF images. Because generating montage images has challenges and can introduce artifacts, such as ghost (duplicated) vessels, currently the clinical analysis relies more on the original UWF images. However, individual UWF images are numerous and are not well aligned due to patient eye movement, which limits the physician's ability to examine multiple UWF images in detail. Therefore, automated registration of UWF image to the montage representation is desirable to standardize the geometry and to produce consistent and repeatable metric quantification.
In this paper, we address the problem of registering UWF retinal images with a stereographic projected montage image and make the following contributions. We propose a novel two-stage hybrid registration approach for UWF retinal images that uses the intensity image and the corresponding binary vessel segmentation maps for the first and second stages, respectively. The first stage coarsely aligns the grayscale UWF images using feature-matching based global registration, and the second stage provides fine registration using vessel-based local chamfer alignment. The methodology enables precise registration of UWF images despite the strong changes in imaging geometries typical for such images. The choice of vessel based chamfer alignment for the second stage makes is also particularly well matched with clinical scenarios requiring longitudinal comparisons between vessel structures. The proposed approach opens up a new resaearch direction where deep learning based vessel detection starts to aid operations such as grayscale image registration, whereas traditionally grayscale image registration is considered a preprocessing step that is performed prior to vessel detection. We also propose a modification to the well-known RANSAC algorithm that separates sample and consensus sets to provide better estimates of the global transformation parameters for our first stage, which significantly improves accuracy in the in the peripheral regions. To evaluate retinal image registration accuracy and to facilitate further research, we introduce a new dataset of UWF FA retinal images called FLoRI21, which presents more realistic challenges for image registration relevant in clinical applications. We present an extensive evaluation of the proposed method on the new FLoRI21 dataset and on the existing narrow-field FIRE dataset [28] and, for both datasets, demonstrate significant improvement over prior methods.
The rest of this paper is organized as follows. In Section II, we present the proposed hybrid retinal registration method. In Section III, we conduct a series of analyses of the proposed method and compare with existing works on retinal image registration. Section IV summarizes the concluding remarks.

II. HYBRID REGISTRATION APPROACH
The proposed hybrid retinal image registration technique, which is illustrated in Fig. 2, leverages the complementary strengths of the feature-based global registration and the vesselbased local registration. Used in conjunction with robust techniques, such as RANSAC, feature-based global registration avoids local minima for transformation parameter estimates. However, a global transformation cannot handle the extreme geometric distortion that is introduced from mapping the approximately spherical retinal surface to the image plane. Thus, invariably, the global transformation does not provide adequate registration accuracy. By limiting the scope to a restricted smaller spatial region, the local registration, however, can model the residual mismatch remaining after the global registration. Specifically, for the local registration, we utilize the anatomical segmentation of retinal vessels detected from input images and partition the coarsely aligned vessel pairs into overlapping patches. Unlike the global registration that relies on a set of sparse feature correspondences, the local registration step optimizes a dense vessel pixel agreement metric for each local patch via the parametric chamfer alignment.
Next, we describe details for the proposed feature-based global registration and the vessel-based local registration.

A. Global Feature-Based Registration
The global registration step aims to estimate a transformation T β between a reference image I r (fixed geometry) and a target image I t . Since the imaging geometry of the reference and the target images are different and unavailable, we adopt a parametric polynomial transformation that is flexible enough to capture non-rigid changes in geometry. In this step, feature points are first extracted from the reference image and the target image, shown as the cyan and the magenta points, respectively, in the global registration block in Fig. 2. We denote the set of the detected reference feature points by is the location of feature point, and N r is the number of detected reference feature points. Similarly, the set of target feature points is represented as Alternative feature descriptors can be used in the proposed method, including those that are hand-crafted (e.g. SIFT [29]) or learned (e.g. SuperPoints [30]).
From the detected features F r and F t , we establish two sets of (tentative) corresponding points. The first set S c is the consensus set containing the nearest feature matches and the second set S s is the sample set in which only distinctive features are included. Specifically, for each reference feature descriptor f (r) i , we first search its nearest neighbor f (t) j i,1 in the feature space from the target feature set F t , and add the corresponding location pair (p for all j = j i,1 for a suitably chosen threshold τ 1 < 1, we declare it a distinctive match and add it to the sample set S s . To make the feature matching robust, we perform the bidirectional feature matching by swapping the reference features and the target features.
The consensus and the sample sets have different characteristics. While the mismatch rate in the sample set S s is relatively low, the number of matched feature points is small, especially in the peripheral regions in which the registration is prone to error. On the other hand, the consensus set S c includes a large number of matched features, even though the mismatch rate is high as well. Using these two sets of tentative matched features, we propose a modification to the RANSAC algorithm to estimate parameters β for a polynomial registration transformation. The same overall methodology applies for different order polynomial transformations, though the number of parameters and the form of the transformation depends on the specific choice.
In the proposed modified RANSAC, we first randomly select N β matched points from the sample set S s . We determine the optimal transformation parameter β such that the transformed target coordinates closely approximate the corresponding reference points, i.e., p Detailed expressions for A and b are provided in Section S.II in the Supplementary Material for the second, third, and fourth order polynomial transformations that we used in our experiments.
We then determine the inlier points from the consensus set S c that lie within a distance threshold τ 2 under the estimated transformation parameters. The random selection is repeated for N trial times and the selection with the largest number of inliers is used to determine the optimal global transformation parameters.
The overall algorithm for the global transformation estimation is summarized in Algorithm 1. We note that, although simple, the proposed modification to the conventional RANSAC algorithm using separate sample and consensus sets provides significant gains in practice (as we demonstrate in our experimental results).
Once the global transformation parameters have been estimated, we apply the transformationT β to the target vessel map V t to obtain the warped version V t 0 . The resulting coarsely aligned vessel pair (V r , V t 0 ) is then fed to the local registration step.

B. Local Vessel-Based Parametric Chamfer Alignment
As indicated in Section I, a global transformation is usually insufficient to model the extreme geometric distortion in the UWF images. Therefore, in the second step, we refine the registration at the local patch level via parametric chamfer alignment of retinal vessels.
We first partition the coarsely aligned vessel maps V r and V t 0 into K pairs of overlapping patches (see the local registration block in Fig. 2). The number K is determined based on the patch size and the stride size, i.e., K = (H − P + S)(W − P + S)/S 2 , where H, W , P , and S are image height, image width, patch size, and stride size, respectively. The parametric chamfer alignment is performed for each pair of vessel maps separately. For each patch k, let Q r,k = {p is the pixel location in the local patch coordinate system and M r,k is the number of vessel pixels. Similarly, we have a set of vessel pixel locations in the coarsely aligned target vessel patch Conventional chamfer alignment [31] estimates the transformation parameters β k that minimizes the mean-squared chamfer distance from the target vessel pixels in Q t,k to the reference vessel pixels in Q r,k . For each target vessel pixel j, the squared chamfer distance d j (β k ) is calculated as the minimum squared distance between its transformed locationT β k (p (t,k) j ) and the nearest pixel location in Q r,k . This formulation, however, is not robust to outlier vessel pixels that only exist in Q t,k and do not have correspondence in Q r,k . To handle the outlier pixels, we adopt a probabilistic framework for the parametric chamfer alignment [32]. Specifically, we further associate each target vessel pixel j with a latent variable z k j ∈ {0, 1} to indicate if the pixel j has a correspondence in the reference vessel set (z k j = 1) or not (z k j = 0) and refer to the pixels in the two groups as inliers and outliers, respectively. We model the prior probability of p(z k j = 1) as a Bernoulli distribution with unknown parameter π k . Moreover, we assume that the registration error d j (β k ) for inlier vessel points should be close to 0 and thus model the conditional probability p(d j |z k j = 1) as an exponential distribution with unknown parameter λ k . For outlier vessel points, the registration error p(d j |z k j = 0) is modeled as a uniform distribution over [0, P 2 ].
We apply the Expectation Maximization (EM) algorithm [33] to obtain maximum-likelihood estimates of the unknown parameters π k , λ k , and β k . The EM algorithm iterates between two steps: the Expectation step (E-step) and the Maximization step (M-step). In the E-step, we calculate the posterior probabilities p k j = p(z k j = 1|π k , λ k , β k , d j ) using Bayes' rule. In the M-step, we maximize the expectation of the complete-data log likelihood and obtain the updated parameters π k = M t,k Fig. 3. Illustration of the displacement vector field interpolation. The green and the magenta pixels are the vessels in the reference and the target images, respectively. In the first two columns, the cyan point shows a target pixel in four patches and the yellow arrow is the local displacement vector estimated from each patch. The interploated displacement vector is shown in the third column. The rightmost image shows the result from the local registration, where the white pixels are the common vessels between the image pairs.
The minimization of (3) corresponds to a probabilistic chamfer alignment in which the registration error d j for each pixel is weighted by the posterior probability p j that it is an inlier pixel. Using the EM framework, the distance metric for probabilistic chamfer alignment concentrates on the inlier pixels and reduces the impact of outlier pixels, which results in robust parameter estimation. In addition, the computation of the objective function in (3) can be done efficiently using the distance transform [34]. We provide detailed derivations of the posterior probability and the parameter updates in Section S.III in the Supplementary Material.
To fuse the multiple transformation parameters β k estimated from the overlapping patches, we propose to perform displacement vector field interpolation, which is illustrated in Fig. 3. For each patch k, we determine a local displacement vector field φ k . The local displacement vector field at pixel location in the reference patch represents the vector offset from this location to the corresponding location in the target patch. The local displacement vector φ k (x k i ) for the pixel location x k i can be obtained from the estimated transformationT β k as whereT −1 β k is the inverse of the geometric transformationT β k . The interpolated displacement vector for pixel x i is obtained as where N i is the set of patches that contain the pixel x i . To obtain the warped target image I t→r that is aligned with the reference image I r , we transform the target image I t according the interpolated displacement vector field φ.

III. EXPERIMENTAL RESULTS
In this section, we first describe the datasets used for quantitative assessment and introduce a new UWF FA dataset, FLoRI21 [35], for retinal image registration. Next, we define the metrics for quantitative evaluation and list the existing retinal image registration methods for comparison. We then conduct extensive experiments to evaluate the proposed method for retinal image registration. We first study the effectiveness of the proposed two-stage method via a series of in depth ablation studies and analysis (Section III-C) and then perform a comparative evaluation of the proposed method against existing retinal image registration methods (Section III-D).

A. Dataset
We validate the proposed method for retinal image registration on a new ultra-widefield FA dataset, FLoRI21, and on the existing narrow-field FIRE (fundus image registration) dataset [28].
The FLoRI21 (Fluorescein-angiography Longitudinal Retinal Image 2021) dataset provides 15 pairs of UWF FA images that were collected from subjects enrolled in the RECOVERY study (ClinicalTrials.gov Identified: NCT02863354) [36]. For each subject, there is one montage FA image ( Fig. 1(a)) and a set of longitudinal raw FA images (Figs. 1(b) and (c)) that were taken nominally 24 weeks apart. The montage FA serves as the common reference image and the raw FAs are target images. All images are acquired using Optos California and 200Tx cameras (Optos plc, Dunfermline, United Kingdom) [37] and are stored in the TIFF format. The montage image, which was created by the Optos software via an equatorial stereographic projection of select individual images [26], [27], has a resolution of 4000 × 4000 whereas the individual FA images are 3900 × 3072. For quantitative evaluation, we manually select a set of corresponding points, which we will refer to as control points, for each image pair. The control points are chosen to be the landmarks on the images, such as bifurcation points and endpoints of the retinal vessels. In addition, the locations of control points are chosen to provide coverage of the entire overlapping region between the two images.
The FIRE dataset consists of 134 pairs of narrow-field color fundus images [28]. All images have the same resolution of 2912 × 2912 and the same of field-of-view of 45 • . The images are split into three categories with different characteristics. Category S contains 71 image pairs that carry high overlap and no anatomical changes; Category A contains 14 image pairs with large overlap and significant anatomical changes; Category P includes 49 pairs of images with small overlap. For each image pair, 10 control points are manually labeled as the ground truth for evaluation.

B. Evaluation Metrics and Benchmark
We use the Mean Registration Error (MRE) [28], the area under the success rate curve, and the chamfer distance for quantitative evaluation. For each image pair, MRE is calculated as the average residual Euclidean distance (in pixels) between the control points under the estimated transformation. Given an MRE threshold τ e , we define the success rate as the percentage of test image pairs in which the MRE is below the threshold τ e SuccessRate (τ e ) = 1 N N i 1 MRE I i r , I i t→r < τ e , (6) where N is the total number of image pairs in the test dataset and 1 is the indicator function. The success rate curve is plotted as a function of the MRE threshold τ e . We compute the area under the curve (AUC) using the trapezoidal method and report the normalized AUC value in the range of [0, 1].
In addition to the area under the success rate curve, we also report the residual chamfer distance (RCD) between the registered binary vessel maps, which provides a good proxy for the registration error. Since the chamfer distance is asymmetric, we compute the two-way chamfer distance by swapping the reference and the target images and report the median two-way chamfer distance computed over all test image pairs.
We consider the following methods as baselines for comparison: REMPE [20], AC-RegNet [38], Harris-PIIFD [39], GFEMR [11], RIR-BS [40], SURF-PIIFD-RPM [41], and GAIN [42]. We use the code provided by the authors and keep the default parameter settings as suggested in the corresponding papers. We also compare the proposed method with two recent deep learning-based techniques: VoxelMorph [43] and AC-RegNet [38]. These registration techniques cannot directly deal with the extreme geometric changes seen in UWF images. In order to compare them against our proposed approach, we first coarsely align these images using our SIFT-based global feature based registration and then utilize these methods for residual local registration (on top of the global registration transformation). Furthermore, we re-train these techniques for UWF imagery, which is quite different from the MRI and X-ray image modalities on which these methods were originally applied and reported in their respective publications.

C. Implementation, Ablation Study, and Analysis
We implement the proposed method using MATLAB and PyTorch [44]. We use the pre-trained neural network models [32] and [45] to detect vessels from UWF FA images and narrow-field fundus images, respectively. Based on empirical evaluation, the feature matching threshold τ 1 is set to 0.5 and the inlier distance threshold τ 2 is set as 25. We run 800 RANSAC trials N trial to find the best global transformation parameters. In the local registration stage, we convert the coarsely aligned images into 512 × 512 patches with a stride of 256 pixels. With this setting, the displacement field vector of each pixel location is estimated from 4 overlapping patches. Local registration cannot be reliably determined for patches with very few vessel pixels without using additional context. This situation is sometimes encountered for peripheral patches and addressed by combining patches having under 1% vessel pixels with the horizontally/vertically adjacent patch that has the larger number of vessel pixels.
To demonstrate the contribution of each stage in the proposed method and to verify the benefits of using vessel maps for the local registration, we conduct a series of ablation studies and algorithm analysis. We first provide an empirical justification for the orders of polynomial transformation used in the proposed hybrid registration framework. In Table I, we report the AUC and the ROC metrics for the second, third, and fourth order polynomial transformations in both global and local registration steps. For the global registration step, the third order polynomial transformation offers significant improvements over the second and the fourth order transformations. For the local registration, all the transformations achieve the similar performance with  I  EVALUATION OF DIFFERENT ORDERS OF THE POLYNOMIAL  TRANSFORMATION USED IN THE GLOBAL AND THE LOCAL REGISTRATION  STEPS ON THE FLORI21 DATASET  minor benefit from going to high-order polynomial transformation. Therefore, we use the third and second order polynomial transformations in the global and the local registration steps, respectively. Next, we compare the proposed modified RANSAC matching algorithm with the traditional version on the FLoRI21 dataset. Note that the FLoRI21 dataset is more challenging than the FIRE dataset for RANSAC matching because the UWF FA images in the FLoRI21 dataset have significant geometric distortion in periphery. We implement two versions of the traditional RANSAC. The first one, denoted as RANSAC S c , estimates the global transformation parameters from the consensus set S c that contains the nearest neighbor for each SIFT [29] feature point. The second version, denoted as RANSAC S s , uses the sample set S s that includes only the distinctive SIFT feature matches. In addition to the AUC and RCD metrics, we also report the median number of matched inlier points estimated by RANSAC. Table II summarizes the results on the FLoRI21 dataset. The RANSAC S c finds the least number of inlier points because the tentative feature correspondences in S c contain significant mismatches. Therefore, RANSAC does not perform well on S c . While the median number of inlier points from RANSAC S c is 220, these matched features are not widely distributed over the image, as shown in Fig. 4(c). The RANSAC S s method can determine the transformation parameters that coarsely align image pairs. However, the number of matched inlier points is significantly smaller than the proposed RANSAC method (407 v.s. 1418, respectively). As shown in Fig. 4(b), RANSAC S s performs poorly in the peripheral regions with large geometric distortion. The proposed RANSAC matching method is able to robustly determine the inlier matched points by combining both the sets S c and S s .  In the third experiment, we demonstrate the effectiveness of the local registration step by comparing the proposed hybrid registration framework with global registration methods. We experiment with both hand-crafted SIFT [29] features and the learning-based SuperPoint [30] features. For SIFT feature extraction, we use the publicly available VLFeat library [46]. For SuperPoint [30] features, we adopt the pre-trained network provided with the publication. 1 Table III   results on both the FLoRI21 and the FIRE datasets. Using the global registration as the initial step, the local registration step in the proposed hybrid framework, improves the registration accuracy significantly. On the FLoRI21 dataset, the proposed framework improves the AUC metric from 0.802 to 0.866 and reduces the RCD error from 8.859 to 5.127 pixels. A similar gain is also obtained on the FIRE dataset: the proposed framework achieves an AUC of 0.894 and the RCD of 1.730 pixels. We also note that the coarse alignment from the SuperPoint features, although slightly worse than that from the SIFT features, still provides good initialization for the local registration step.
In the fourth experiment, we justify the choices of the binary vessel map for the local registration by comparing the proposed method against alternative keypoint-based and intensity-based registration techniques for the second stage. To ensure the same experiment setting, we use the patch size of 512 × 512 pixels for all methods and perform the SIFT-based global registration to coarsely align the input image pairs. For the keypoint-based method, we select the matched SIFT features that lie inside each patch to estimate the local transformation using (2). For methods that take the intensity image as input, we consider the non-parametric deformable registration (diffeomorphic demons [47]), the optic flow using the TV-L1 solver algorithm [48], and the intensity-based registration that maximizes the mutual information (MI) between the image pair [5]. Quantitative results on the FLoRI21 dataset are listed in the second block of Table III. Compared with the global registration (AUC of 0.742 and RCD of 11.859), both feature-based and intensity-based local registration methods improves the registration accuracy (AUC of 0.813 and RCD of 8.313 for the best performing diffeomorphic demons method), which further reinforce the advantages of the two-stage hybrid registration framework. The proposed method, which uses binary vessel map in the local registration step, outperforms all alternative methods.
Lastly, we highlight that the local registration step in the proposed method benefits from the recent advances deep learning based vessel detection. Prior work in [4] adopts an unsupervised vessel detection method, MSMA [49], for the chamfer alignment. For comparison, we replace the pre-trained network model [32] in our implementation with MSMA and test the registration accuracy. Both methods are initialized using the feature-based global registration. Quantitative comparison on the FLoRI21 dataset, listed in the penultimate row in Table III, show that the registration using unsupervised vessel detection performs worse than that using deep neural network model. The unsupervised vessel detection method fails to detect many vessel branches from the FA image, especially in the low contrast region in the periphery. In contrast, the deep neural network can robustly detect both major and minor vessels, regardless of image contrast. An accurate vessel segmentation map is crucial in the local registration of the proposed method because the parametric chamfer alignment relies on binary vessel maps for anchoring. It is also worth noting that prior work in [4] directly applied the chamfer alignment on the input image, without the global registration step, which tends to be non-robust because the chamfer alignment is only locally convergent.  Table IV summarizes the corresponding area under the curve (AUC) and the RCD metrics. On both datasets, the proposed hybrid registration method shows significant improvement over existing methods. The proposed method achieves an AUC of 0.866 and a RCD of 5.127 pixels on the FLoRI21 dataset and 0.894 AUC and 1.730 pixels RCD on the FIRE dataset. We observe that the performance of existing methods on the FLoRI21 datasets is much worse than those on the FIRE dataset. The primary reason for the poor performance is that these methods do not account for the significant distortion in the peripheral regions. For instance, the REMPE method [20] assumes that retinal camera can be modeled as a pinhole camera model without lens distortion. This assumption, however, is not valid for UWF retinal images.

D. Results and Benchmarking
Native implementations of the recent learning based registration techniques AC-RegNet [38] and VoxelMorph [43] cannot directly handle the large geometric variations seen in the FLoRI21 dataset. When coarsely registered images obtained by the SIFT-based global registration from the proposed approach are provided as the input and these methods are retrained on UWF imagery, they can provide reasonably registered images. The results in Table IV indicate that, when augmented with our first stage coarse registration, these learning-based methods provide better registration accuracy than prior non learning-based methods. However, their performance is still worse than that of the proposed method.
Comparing the results presented in Table III, we notice that, on the FLoRI21 dataset, the proposed approach still performs better than existing methods even using alternative local registration techniques. The results further demonstrate the benefits of the proposed hybrid registration framework. Fig. 6 shows sample registered binary vessel maps obtained from different methods on the FLoRI21 dataset and the FIRE  dataset. For a better visualization, we only include local vessel patches in Fig. 6 and provide the entire registered vessel images in Section S.IV in the Supplementary Material. The proposed hybrid registration method can accurately register the retinal images. For existing methods, we can see clear misalignment between the reference vessel map (magenta) and the registered target vessel map (green).

IV. CONCLUSION AND DISCUSSION
The novel two-stage hybrid registration framework proposed in this paper provides a useful approach for registering UWF retinal images to standardized stereographic projections where extreme geometric distortions are commonly encountered. Experimental evaluations on two datasets, including a new FLoRI-21 UWF dataset [35], show that the proposed approach outperforms prior retinal registration methods by a significant margin. The FLoRI21 dataset is made publicly available to facilitate further research on retinal image registration.
The second stage in the proposed framework effectively leverages the advances made in vessel segmentation and focuses on the anatomical features of interest, which is well matched with clinical scenarios where the longitudinal changes in vessel structures are of interest. Although beyond the scope of the present work, the idea could also be generalized to other image modalities, for instances, leveraging anatomical segmentation to register longitudinal brain MRI images [43].