Protection of Sparse Retinal Templates Using Cohort-Based Dissimilarity Vectors

Retinal vasculature is a biometric characteristic that is highly accurate for recognition but for which no template protection scheme exists. We propose the first retinal template protection scheme, adapting an existing paradigm of cohort-based modelling to templates containing the node and edge data of retinal graphs. The template protection scheme results in at most 2.3% reduction in accuracy compared to unprotected templates. A common concern with cohort based systems is that the availability of distance scores can be exploited to reconstruct the biometric image or biometric template using inversion attack. On the contrary, we show that using our sparse templates in a cohort-based system results in less than 0.3% success rate for an inverse biometric attack. In addition, rigorous unlinkability analysis shows that the template protection scheme has linkability scores at least as low as or lower than the state-of-the-art template protection schemes.


I. INTRODUCTION
B IOMETRIC data is categorised as sensitive personal data by the European Parliament [1], [2]. Considering the limited number of biometric characteristics an individual has, and the sensitivity of personal information they convey, the ISO/IEC 24745 [3] International Standard provides guidelines and requirements for developing a biometric template protection (BTP) scheme. BTP schemes, in contrast with conventional biometric recognition systems, do not store or compare unprotected biometric templates during the recognition phases (enrolment and verification/identification), thus preserving users' privacy.
Digital Object Identifier 10.1109/TBIOM.2023.3239866 BTP scheme. Because retinal vasculature has intrinsic liveness detection and is hard to spoof, it is a suitable candidate to be used in a BTP scheme.
In this work, we are interested in protecting retina using the cohort-based paradigm. Cohort-based biometric systems compare or model the biometric samples/templates of a user based on a cohort of samples/templates different from those captured from the user. The cohort could be applicationspecific, i.e., all users within a system could be compared with the same cohort, but different systems will have different cohorts, or it can be user-specific, where each user in a system will have a different cohort. Cohort-based modelling has been used in the literature mainly for enhancing recognition performance. Some biometric recognition systems in the literature have modelled the biometric templates of an individual based on biometric templates of other users in the system (background or gallery). Based on how the set of imposter templates is selected from the background, different terms have been used for it (cohort, likelihood-ratio-based set, background set, identification set) [6], [7], [8].
Current literature that uses the cohort-based similarity/dissimilarity representation [6], [7], [8] employs a userspecific set of imposter samples for enhancing performance (not for protecting templates). Only Gyaourova and Ross [9] use an application specific cohort, and their representation was not developed as a BTP scheme. They mentioned the possibility of selecting an imposter set from outside the dataset, but there is no trace of such a selection strategy in their paper. To the best of our knowledge, there are only two studies that use cohort-based modelling for template protection [10], [11]. These schemes were developed to protect signature, and voice template. Both of these works apply fuzzy commitment schemes as the second layer of security and are therefore hybrid biometric schemes. These two schemes use background modelling to model each user based on the data of the rest of system's users. The existing works in cohort-based biometric recognition model users based on their Gaussian mixtures (e.g., for voice recognition), Hidden Markov models (e.g., for signature), or Likelihood-Ratios (e.g., for face). Only Eskander et al. [12] applied a dissimilarity vector representation for designing a biocryptographic signature scheme. However, in their work, they only studied the performance of the signature recognition scheme with no analysis for irreversibility and unlinkability of templates.
Cohort-based systems have been developed for face, fingerprint, voice, and signature characteristics. There exists no work on protecting vascular templates, retina in particular, using the cohort method. Here, we extract spatial graph templates from retinal vascular patterns. The graph templates that we are protecting are sparser representations than images. We use a cohort of graph templates to be compared with the graph templates extracted from users' retinal samples. We develop a template protection scheme that generates a transformed version of the (unprotected) biometric template as a vector of its distances from a set of imposter biometric templates. The term "imposter" indicates that the templates are from the same biometric characteristic captured from different individuals or synthetically generated, so as to avoid leaking information about the templates being protected. This set of templates is called the cohort. We test the system's privacy protection under a worst case attack scenario, where the attacker has access to the cohort templates and the comparison algorithm of the system. What sets the system tested here apart from those in the literature is that the majority of those systems are designed to optimise or improve the recognition performance [13], [14], [15], [16]. The difference between our work and the hybrid BTP schemes in the literature [10], [11] is that our proposed scheme applies a single non-invertible transformation which maps the biometric template of each user to a vector of dissimilarity scores between the user and the ordered cohort. We show that this transformation is sufficient; it does not need a second layer of protection to conceal the comparison scores.
It is important for the recognition performance on protected reference templates (also called pseudonymous identifiers, PI, based on the Harmonised Biometric Vocabulary [17], [18]) to be comparable with the performance on unprotected templates. We will discuss recognition performance of the cohort-based dissimilarity vectors in Section V. The system should also capture a desired property called "renewability" of protected reference templates. If the biometric template of a user is compromised, this property guarantees that the compromised template can be revoked and a new independent template can be regenerated for the user. Renewability can be achieved for the user in this system by selecting a new cohort for it. The only issue with this change of cohort is that if we want to assume an application-specific cohort for such a system, each user should be re-enrolled in the system even if just one template is compromised. That is why a user-specific cohort may be preferred.
The ISO/IEC 24745 in addition to performance and renewability considers two required properties to protect the biometric templates from potential security and privacy threats. These requirements are called "irreversibility" and "unlinkability" of protected reference templates. The standard definitions in ISO/IEC 24745 [3] are: • Irreversibility: property of a transform that creates a biometric reference from a biometric sample(s) or features such that knowledge of the transformed biometric reference cannot be used to determine any information about the generating biometric sample(s) or features. • Unlinkability: property of two or more biometric references that they cannot be linked to each other or to the subject(s) from which they were derived.
Previously, we have applied the MSK reconstruction method [19] which applies MDS (Multi-Dimensional Scaling) and an affine approximation to reconstruct biometric samples from comparison scores. We showed that this reconstruction technique is unsuccessful in reconstructing sparse pointpattern templates extracted from retinal vascular patterns [20], and hand vascular patterns and on retinal images [21] where point-pattern comparison scores/algorithms were applied. The same approach is applied in this paper to show that this reconstruction attack has a very low success rate (less than 0.3%) in reconstructing retinal images when the biometric graph comparison (BGC) algorithm [4] is used to compare the sparse graph templates extracted from retinal vasculature.
For evaluating linkability of the dissimilarity vectors, we applied The General Framework to Evaluate Unlinkability [22]. This framework requires (at least) 10 different applications with 10 different keys to evaluate the level of unlinkability for the scores and for the system. The framework is general and can be applied to different BTP schemes. This would allow comparison of our proposed scheme with other BTP schemes in terms of their level of unlinkability. It requires a Lebesgue integrable linkage function to evaluate the divergence of score distributions for mated and non-mated comparison trials among the 10 different applications of the system. In Section VII, we apply this method to evaluate the linkability of templates in our proposed system, D ↔ (s), as a local measure and to evaluate the linkability of the system, D sys ← → as the global measure. In the remainder of this paper, retina graph extraction and comparison are discussed in Section II. Section III introduces the cohort-based dissimilarity vectors paradigm. Section IV describes how this paradigm can be applied for protecting retinal templates. In Section V, the performance of the cohortbased retinal recognition system is evaluated. Section VI reviews irreversibility of dissimilarity vectors. The linkability level of the system is evaluated in Section VII. The conclusion is made in Section VIII.

A. Retina Datasets
We use three retina datasets in this research. The first one, collected by RMIT University, Australia, is called ESRID (ECG Synchronised Retinal Image dataset) [23]. ESRID has 414 images in total taken from 46 individuals. For each individual, images of their retina at 9 distinct cardiac points were taken. Therefore every individual had 9 samples of their retina [24].
The second dataset is the publicly available VARIA dataset [25], [26]. The VARIA dataset has 233 retina images from 139 individuals. The retinal images are optic disc centered [25].
The Messidor-2 dataset [27], [28] includes samples from 874 data subjects. Each data subject has one sample from each of their left and right eyes (1748 images). We used left eye images from Messidor-2 in our experiments. Since this dataset was collected from Diabetic Retinopathy examinations, it contains samples with degenerated optic discs or samples with

B. Retina Graph Extraction
The retina graph is a spatial graph representation of pertinent information in a retina image. The vertices represent the locations of vascular features (arterial and venous branching and arteriovenous cross-overs) which are connected by a straight edge if they are connected by an 'uninterrupted' vein in the image. Several image processing steps extract the spatial graph representation from a retina image. Similar approaches have been applied to palm vein images, hand vein, and wrist vein [29] and shown to improve the accuracy [30].
Our study develops a fully automatic process of retinal image graph extraction. Features of interest are arterial and venous vessel branching, their crossing points and the optic disc center. To extract these features, the process includes blood vessel segmentation, skeleton extraction, graph extraction and optic disc identification. The overall image processing time depends on the image size, the complexity of blood vessels and computer processing capacity. The image sizes are between approximately 750 × 750 and 2300 × 1500 pixels in ESRID, 768 × 584 pixels in VARIA, and 2240 × 1488 in Messidor-2.
Blood vessels are segmented using trainable COSFIRE filters [31], [32]. The first and second columns in Fig. 1 show the original retinal image and binary image with segmented vessel tree.
Retinal graph extraction first requires that a skeleton be extracted from the binary image using morphological operations described in [29]. Graphs are extracted from the skeleton using the technique of Lajevardi et al. [4]. The third and fourth columns in Fig. 1 show the skeleton and the extracted graph features. 1 https://github.com/mahshidsa/Unlinkability-DissimilarityVector-Retina.git Once the feature points and connecting edges are identified, the boundary and center of the optic disc is extracted. We use the Hough transform to identify a circle from the segmented optic disc edge. Fig. 2 shows the identified optic disc and centre point for example images in the ESRID, VARIA, and Messidor-2 datasets.

C. Retina Graph Comparison
The Biometric Graph Comparison (BGC) algorithm is a technique to compare noisy spatial graphs and has been shown to perform well for vascular biometric graphs [4], [5], [29], [30], [33]. The BGC algorithm has two components, registration and inexact graph matching. For retina graph registration, the center of the optic disc is treated as the center of the graph coordinate system while the frame orientation is kept the same. The attributes of the vertices in both graphs are changed to reflect the new coordinate system. This registration approach relies only on the optic disc features extracted by the image processing algorithm.
Once registered, the comparison of biometric graphs involves inexact graph matching. Graphs from a pair of retinas, even from the same individual, will have structural differences like variation in the locations of vertices and in the presence or absence of vertices and edges. These errors are mainly due to changes in illumination and position when an individual makes presentations to the capture device. This can affect the quality of the captured image and hence change the details of the graph extracted by the image processing. Riesen and Bunke [34] proposed a suboptimal inexact graph matching algorithm based on the Hungarian algorithm that lists the cheapest edit path to map the source graph to the destination graph. BGC [4] is an adaptation of this algorithm for biometric graphs and is described in detail in [5].

III. RETINA DISSIMILARITY VECTOR COMPARISON
We use the cohort-based dissimilarity vectors to represent and compare retina graphs. A vector-based representation will enhance matching performance by the use of wellestablished classification techniques that are not available for graph comparison.
The retina graph that is represented as a vector of dissimilarities from a cohort is called the target graph. The cohort is an ordered set of retina graphs from which the dissimilarities to the target are computed. We use retina images from ESRID to form the target graphs and the retina images from VARIA and Messidor-2 to form the cohorts.
The graphs in ESRID, VARIA and Messidor-2 are captured at different resolutions. To create the dissimilarity vectors, retina graphs are first rescaled so that the coordinates of the graph features are all in the same scale, from 0 to 100 pixels.
Every ESRID target graph g is represented as a dissimilarity where each d i is the distance between g and the i th graph in the cohort set, computed by the BGC algorithm.
The parameters in the BGC algorithm described in [5] were chosen to generate small distance values for mated comparison trials (genuine comparisons; samples from the same individual) and large distances for non-mated comparison trials (imposter comparisons; samples from different individuals). However, these parameters will not suit the calculation of distance values for a dissimilarity vector. This is because every comparison of a target graph with a cohort member will be an imposter comparison. As a result, vectors generated from different ESRID graphs will all have similarly large distance values to the set of cohorts and will not be discriminative across classes (enrolled individuals).
For the dissimilarity vectors from the cohort to be useful and discriminative, we 'force' imposter comparisons to match more consistently and find common structure. To do this, we implemented two changes to the BGC algorithm: • Instead of using the 'Registration' component of BGC, we register the compared graphs using the center of the optic disc as it is reliably extracted from the retinal image, independent of the vascular pattern. Consequently, every graph in ESRID, VARIA, and Messidor-2 is brought to a common reference frame by translating the features so that the center of the optic disc forms the center of the coordinate system. The graphs are not rotated. • We slacken the graph matching parameters in order for the compared graphs to match reasonably similar features. Graph matching parameters, α (cost of inserting/deleting a node) and β (cost of inserting/deleting an edge), are the cost matrix weights in the edit distance computation [5], [35]. Slack graph matching parameters gave a wider spread of dissimilarity scores across classes while keeping the dissimilarity scores within classes close to each other. Using the above BGC settings, it is expected that in the absence of any noise, biometric graphs from the same class (enrolled user) will match the same set of vertices in a cohort graph. With the expected noise in capturing a biometric graph, we expect the matched set of vertices to be slightly different, but not so much that it would return a very different distance value. Forcing the ESRID graphs and the cohort graphs to find matching structure is based on the premise that some cohort graphs have similar structure to some of the classes of ESRID graphs considered. Therefore a comparison will give a small dissimilarity value in the corresponding dimension of the vectors, for all samples in this ESRID class. Along the same lines, some cohort graphs may be consistently and strongly dissimilar to all samples in a particular ESRID class.
IV. COHORT-BASED DISSIMILARITY VECTORS FOR RETINA PROTECTION The BTP system we propose has two phases, enrolment and verification. During enrolment, a user presents the retina image at the Sensor and the Feature Extractor (FE) extracts a spatial graph from the presented biometric sample using the method as above. At the PIE module, the extracted spatial graph will be compared with the cohort graphs using the BGC algorithm. The PIE module in Fig. 3 illustrates how the (unprotected) spatial graph is transformed to the (protected) dissimilarity vector. The cohort graphs are stored separately in the cohort storage. The set of dissimilarity measures between the biometric template(s) of the user and each cohort member will be stored in the database as a vector, which serves as the biometric reference(s). Because at this step the extracted spatial graph is transformed into a dissimilarity vector, this step is the feature transformation phase.
An SVM is trained as a binary classifier for each user (class) separately during enrolment. For training the SVM for each user, 5 genuine samples are acquired during enrolment to generate 5 pseudonymous identifiers. Thus, each user has 5 dissimilarity vectors stored in the database (we show only one v in Fig. 3 for the sake of simplicity). Then, the 5 genuine (mated) pseudonymous identifiers are used along with 20 (random) imposter (non-mated) pseudonymous identifiers collected from the other classes to train the SVM for this user.
At the verification phase, the probe biometric template will be compared to the cohort to obtain its dissimilarity vector, v . This dissimilarity vector will be used by the trained SVM for the claimed class. The SVM (depicted by PIC to be consistent with ISO/IEC 24745 notations) calculates the probability (similarity score) of the probe sample being a member of the claimed class. This similarity score represents how similar the probe v is to the claimed class based on the trained SVM for this class. Fig. 3 illustrates how the proposed BTP scheme stores, and compares the cohort-based dissimilarity vectors (protected reference templates).
To meet the ISO/IEC 24745 privacy protection requirements, the cohort-based templates must be irreversible and unlinkable in addition to having comparable performance to the unprotected (graph) templates.

V. ACCURACY OF GRAPH COMPARISON VS DISSIMILARITY VECTOR COMPARISON
To ensure that the recognition performance for unprotected templates is comparable to recognition performance for protected templates, we first assess the performance of the system that compares the templates in the unprotected domain (graph domain). Then, we evaluate the accuracy of our proposed BTP scheme (which compares protected templates).
We tested the classification accuracy of the ESRID graphs (unprotected templates) by comparing them in the graph domain using the original BGC algorithm [4], then classifying the BGC scores using 5NN, LDA, and SVM classifiers.
To test verification performance of the graph-based approach, we built a binary classifier for each class taking 5 Fig. 3. The architecture of the proposed cohort-based retinal system. The figure is drawn consistently with ISO/IEC 24745 architecture for BTP schemes [3]. PIE, PIR, and PIC stand for Pseudonymous Identifier Encoder, Pseudonymous Identifier Recorder, and Pseudonymous Identifier Comparator, respectively. B r , B p , T r , and T p stand for reference biometric sample, probe biometric sample, reference template, and probe template, respectively. When the user presents his/her retina sample to the system, the feature extractor (FE) module extracts a spatial graph template from their retina. To ensure that the user's private data is protected, the system does not store this graph template in the database. Instead, the extracted graph template is compared with a cohort of other graph templates extracted from retina samples of individuals different and independent from users (or synthetic retinal images) to generate the dissimilarity vector v. The graph template is then discarded. The Graph Comparison (BGC) module compares the extracted graph from the user with every cohort member and their pair-wise distances will be stored as a dissimilarity vector, v = (d 1 , d 2 , . . . , d n ) in the database. This dissimilarity vector is the protected reference template. Cohort graph templates are represented by c 1 , c 2 , . . . , c n . The cohort graphs are stored in a separate storage. In a user-specific scenario, during enrolment, a different cohort will be associated with each user. Then, during verification the cohort associated to the claimed identity is sent to the PIR module. During enrolment, 5 samples are captured from each individual, so that each individual has 5 dissimilarity vectors (PIs) stored in the database (only one sample and one PI is shown in the figure). The SVM for the claimed class (user) is trained during enrolment using the 5 mated (genuine) PIs from the enrolled user and 20 non-mated (imposter) PIs from other classes (users). During verification, the SVM (depicted by PIC to be consistent with ISO/IEC 24745 notations) generates a probability score (similarity score) for v extracted from the probe sample. This similarity score s represents how similar v is to the claimed class based on the trained SVM for this class. The retinal images used in this figure are selected from the ESRID dataset.
(genuine) scores of each class and 20 (imposter) scores at random from the other classes to train the classifier. This allowed the remaining 4 scores from the class and 385 scores from other classes to be used to test the classifier. The number of true positives and false positives was calculated. This was done for all 46 classes and the counts were combined to measure the True Positive Rate (TPR) and False Positive Rate (FPR) for one run. All 9 5 possible training-testing combinations for a class were considered giving 126 runs of the verification experiment. Three different binary classifiers were tested each time -SVM, LDA and 5NN. The verification performance for each run was measured by generating a ROC curve and calculating the Area Under the Curve (AUC). Table I shows the mean AUC (with standard deviation) using each classifier in the unprotected domain.
We next used the BGC algorithm to create dissimilarity vectors for all 414 ESRID graphs. All n = 233 graphs in VARIA were used as the cohort. For consistency in using cohort members selected from Messidor-2, we used 699 samples of Messidor-2 dataset to create three independent cohorts of size n = 233. Rescaling and OD-based registration was done before implementing the inexact graph comparison stage, where slack graph parameters of α = β = 2, 0.3 were chosen for the VARIA and Messidor-2 cohorts, respectively. The slack graph parameter for this comparison was selected using grid search over the parameter space and identifying the minimum value of α and β such that the separation between inter class and intra class distances to a chosen cohort graph was maximised.
The process to build the classifiers for comparison in the protected domain was the same as the process used for graph templates. For the system that uses VARIA graphs as the cohort, three different binary classifiers were tested each time -SVM, LDA and 5NN. The verification performance for each Since for the VARIA cohort graphs SVM showed to have a superior performance in classifying the dissimilarity vectors, we applied the SVM classifier on three different cohorts selected from the Messidor-2 dataset. Table III shows the recognition accuracy when SVM is used to classify the ESRID dissimilarity vectors using 3 different cohorts selected from the Messidor-2 dataset. The figures show that SVM has an accuracy over 98% in classifying our cohort-based dissimilarity vectors.
For protected templates (dissimilarity vectors), using the SVM classifier we observed at most 1 − 2% drop in accuracy from that for unprotected templates. Thus, the cohort-based dissimilarity vectors scheme satisfies the first criterion for it to be used as a template protection scheme -minimal degradation of performance compared to the unprotected approach.

VI. IRREVERSIBILITY OF DISSIMILARITY VECTORS
As mentioned in the introduction of the paper, we consider the worst-case scenario with the attacker having knowledge of cohort templates and comparison algorithm. This assumption makes the system potentially vulnerable to an MSK reconstruction attack [19] in case the dissimilarity vectors (PIs) are compromised. We now test the vulnerability of the system to such an attack assuming that the attacker i) knows the comparison algorithm, ii) has access to the dissimilarity vectors (PIs), and iii) has access to the cohort templates/images.
The cohort information can be used as a break-in set for this reconstruction attack. Applying image-to-image translation techniques such as pix2pix GAN [36], it might be possible for an attacker to reconstruct retinal cohort images from cohort graphs. For instance, in [37], [38], the authors used Pix2Pix GAN to reconstruct finger vein and hand vein images from binarised skeleton templates. Thus, it is worth evaluating the threat of the MSK reconstruction attack on our proposed system. We have previously showed that this attack fails in reconstructing retinal point pattern templates (the nodes of our graph templates) and retinal images, when point pattern comparison algorithms are applied [20], [21]. Our results in [20], [21] show that the MSK reconstruction attack cannot reconstruct nodes of our graph templates sufficiently similar to the original genuine point pattern templates. The reconstructed templates show a distribution similar to the non-mated (imposter) templates and hence cannot reveal personal information about the users. Here, we show the attack also fails if retinal images are reconstructed then our graphs are extracted, when BGC is used as the comparison algorithm.
The experiments in this section are similar to the ones in [21], except for the comparison algorithm used in them.
In these experiments, we will apply distance scores calculated by the BGC algorithm. In addition, when images are reconstructed, we will extract graph templates (not point pattern templates) from them using the method in [4]. That is, we will try to reconstruct images and then extract the template of nodes and edges from those images. We used ESRID samples as our cohort members to simulate the scenario in which a potential attacker has access to the closest possible samples to the targets. Thus, the cohort and references are selected from the same dataset (ESRID) but members of the cohort are different from the users. If this scenario is hard, then trying the inverse attack with cohort members from a different dataset, where image resolutions may differ, will be even harder. For calculating dissimilarity vectors, we used the edge-based distance in [5] with parameters α, β = 0.4.
The experiments performed in this section are to reconstruct ESRID reference images using the BGC algorithm. We ran 46 tests each reconstructing 9 reference images of a data subject using 405 break-in images from every other data subject. The code for these experiments is available on our Github repository. 2 Some examples of reconstructed BR (biometric reference) images are collected in Fig. 4. The images on the top row represent the preprocessed reference images. The bottom row represents their corresponding reconstructed retinal images.
We extracted graph templates from reconstructed reference images. Then, we compared these extracted templates with their corresponding targeted reference templates using the BGC algorithm. The graph extraction algorithm did not extract any features from 44 of the reconstructed reference images. So, in total we obtained 370 scores for reconstructed BRs. The dashed blue curve in Fig. 5(a) represents the distribution of scores for reconstructed reference images. For an inverse attack to be successful, it is expected that the distribution of scores for reconstructed images will overlap with the distribution of scores for mated comparison trials (as occurred for face images in [21]). However, as can be seen in Fig. 5(a), the distribution of scores for reconstructed reference images overlaps with that of non-mated comparison trials, which means the comparison system considers the reconstructed reference images as imposters. This shows that the reconstruction attack was not successful.
The DET curves in Fig. 5(b) represent the error rates when classifying the scores for non-mated comparison trials of bona fide (original) templates from scores for mated comparison trials of bona fide templates (red curve), and from scores obtained when templates extracted from reconstructed reference images are presented as probes (dashed blue). The high error rates for the dashed blue curve confirm that the system will not recognise the reconstructed templates as matches. At FMR = 0.1%, the IAMR (Inverse Attack Match Rate) [39] is 0.24%. Only one (out of 414) reconstructed BR template can gain access to the system.
Overall, our results demonstrate that the MSK attack is unable to reconstruct retinal images that can penetrate a biometric system that compares sparse graph templates extracted from retina. This shows that our protected templates (dissimilarity vectors) do not reveal sufficient information to a potential attacker to be able to reverse-engineer biometric samples from the dissimilarity vector of a reference template.

A. Data Augmentation
To evaluate unlinkability based on the general framework in [22], we require to have at least 10 independent cohort sets. We selected 699 out of 874 left eye samples in Messidor-2 dataset to create three independent cohort sets. For creating the other seven independent cohort sets, we applied StyleGAN2-ADA (SG2-ADA) [40] to generate retinal images. Training was performed for 90 hours on Google Collaborate using a 16GB Tesla P100-PCIE GPU. We trained SG2-ADA using 832 left retinal images from Messidor-2. We generated 5,000,000 fake retinal images using the default augmentation parameter p = 0.6. The FID (Fréchet Inception Distance) between the real images and the generated images was 10.3. We used truncation trick equal to 0.7 to generate high quality images and selected 1, 631 generated samples to create the seven cohort sets of size 233. The generated dataset, GRETINA, that we used in our experiments [41] is shared on our Github repository. 3 The generated images are 1024×1024 pixel images. We resized them to have similar sizes as ESRID images (750 × 750 pixel) before extracting graphs from them. Fig. 6 shows four samples from GRETINA.

B. Experimental Setup
We have 10 applications of the cohort-based dissimilarity vectors using 10 different cohorts selected from Messidor-2, and GRETINA. For each application, we have 414 dissimilarity vectors of length 233. The 414 dissimilarity vectors belong to 46 data subjects since each data subject has 9 samples in the ESRID dataset. We applied three different linkage functions. The first linkage function that we applied was the PIC of our proposed system which is SVM. We also applied Euclidean distance and cosine similarity as linkage functions to further evaluate the linkability level of the dissimilarity vectors.
Applying the first linkage function (PIC) to obtain scores from mated and non-mated comparisons for each specific application (cohort), we trained an SVM classifier on dissimilarity vectors of that application. Then, we used the remaining 9 applications as testing data. This results in 3, 726 mated comparison trials and 167, 670 non-mated comparison trials. In total, we have 37, 260 scores obtained from mated comparison trials, and 1, 676, 700 scores obtained from non-mated comparison trials for 10 applications.
For the other two linkage functions, i.e., Euclidean distance and cosine similarity functions, training is not required. We can directly compare the dissimilarity vectors using the linkage functions. For each application (cohort) and each data subject, we have 729 mated comparison trials because each 9 samples of the data subject in the application will be compared with 9 vectors that are obtained from the same When calculating the linkability levels we first used ω = 1 which assumes that the probabilities of the scores being from mated and non-mated comparison trials are equal. This assumption considers the worst case scenario in unlinkability analysis which is p(H m ) = p(H nm ) where H m = {hypothesis that both templates belong to mated instances}, and H nm = {hypothesis that both templates belong to nonmated instances} [22].

C. Results
After obtaining the scores generated by mated and nonmated comparison trials among 10 different applications of our proposed system, we used the Python code released by the authors of [22] to analyse the unlinkability of our system. Fig. 7(a) shows the distribution of scores for mated comparison trials (solid green), non-mated comparison trials (dashed red), the score-wise (local) linkability measure (blue line), and the global linkability measure of 0.02 (indicated at the top of the figure) when PIC (in our case, SVM) is used to compare dissimilarity vectors. Overall, the local and global linkability levels are negligible for the system when PIC is used to link the templates.
The local and global linkability of the system for cases which applied linkage functions different from PIC are represented in Fig. 7(b), and Fig. 7(c). The score-wise linkability measure is represented by blue lines, and the global measure is indicated at the top of the figure. The dashed vertical black lines show scores where p(s|H m ) = p(s|H nm ). In Fig. 7(b), we can observe that the global linkability is 0.15 which is considerable compared to the global linkability level when PIC was applied to link templates. This implies that Euclidean distance is more likely able to link the dissimilarity vectors. This figure also shows that for s < 0.625, it is more likely that the two compared templates are from a mated comparison trial. Fig. 7(c) shows that global linkability of the system is D ↔ (s) = 0.07. This indicates that cosine similarity can extract more information to cross-match dissimilarity vectors than PIC (SVM), and less information compared to Euclidean distance function. For a small proportion of scores (44 out of 335, 340 mated scores) where 0.99629 < s < 0.99676 or 0.99975 < s < 0.99977, we can link templates with almost all certainty because p(s|H m ) > 0 and p(s|H nm ) = 0. However, since p(s|H m ) has low values and the proportion of these scores is very low, the global linkability level of system is only 0.07.
Overall, the linkability levels obtained for our proposed system are as low or lower than the linkability scores obtained for the Bloom filter-based face template protection scheme in [22], face and iris template protection scheme in [42], and the scores reported in [43]. Table IV compares the global linkability score of our proposed system with the scores for the state-of-the-art BTP schemes when ω = 1.
Using the parameter ω = 1 assumes, unrealistically, that the system has only two enrolled users. In our case, the ESRID dataset contains samples from 46 individuals. This means that p(H m ) = 1 46 and p(H nm ) = 45 46 which results in ω = 0.02. Applying ω = 0.02 resulted in lower global linkability scores for our proposed system, i.e., 0, 0.0039, and 0.0003 for SVM, Euclidean distance and cosine similarity as linkage functions, respectively. The score-wise linkability scores for ω = 0.02 are provided in Fig. 8. These scores suggest that our proposed system is highly unlinkable and suitable for protecting retinal templates.

VIII. CONCLUSION
In this paper, we propose the first cohort-based retinal template protection scheme. We first showed that the Pseudonymous Identifier Comparator (PIC) has minimal degradation in verification performance when applied to the dissimilarity vectors (protected reference templates).
We investigated irreversibility. We applied the MSK attack to reconstruct retinal images using scores issued by the BGC algorithm. The graph templates extracted from these reconstructed retinal images were not similar enough to the target references to be able to pass the system threshold as genuine. The distribution of the comparison scores of the presented reconstructed templates resembles the distribution of scores for non-mated comparison trials which shows that the reconstructed samples are very different from their original target references. Hence, this inverse method cannot reconstruct our sparse graph templates or reveal personal information about them.
We then evaluated the local and global linkability levels of our proposed scheme based on The General Framework to Evaluate Unlinkability of Biometric Template Protection Systems. We applied three different linkage functions and two different values for the parameter ω. Our results suggest that the linkability of our system is comparable to that of some state-of-the-art template protection schemes. The low linkability level of the system shows that our dissimilarity vectors are secure against cross-matching attacks and hence can be used as protected reference templates. Thus, we believe that our proposed system can be applied to protect retinal templates.
As our future work, we would like to explore the possibility of applying this scheme to other biometric characteristics, vascular characteristics in particular. We are also interested in evaluating the threat of other types of security/privacy attacks such as non-linear reconstruction attacks, presentation attacks, morphing attacks or hill-climbing attacks toward our proposed system. Arathi Arakala received the Ph.D. degree from RMIT University, Melbourne, in 2008. After a series of postdoctoral positions with RMIT University and Monash University, she is currently a Lecturer of Cybersecurity with the Discipline of Mathematics, RMIT University. Her area of research interest includes physiological and behavioral biometric authentication, modeling of online user behaviour for cybersafety, and encryption algorithms that enable zero trust computation.
Stephen A. Davis joined the RMIT University as a Mathematical Biologist in 2009. He began his career with CSIRO, Division of Wildlife and Ecology, and then spent eight years abroad as a Postdoctoral Scientist with the University of Antwerp, Belgium, Utrecht University, The Netherlands, and the School of Population Health with Yale University, USA. He has first-author publications in Nature and Science. He especially enjoys cats and afternoon coffees.
Kathy J. Horadam received the B.Sc. (First Class Hons) degree in 1972, and the Ph.D. degree in pure mathematics from Australian National University in 1977. She has been a Professor of Mathematics with RMIT University since 1995 (Emeritus Professor from 2017) and has worked in academia for 37 years, and for three years with Cryptomathematics Research Group, Australia's Defence Science and Technology Organization. Her research interests include biometrics, applied algebra and combinatorics, and network science. She is a Fellow of TICA and AustMS, a member of AMS, and a Life Member of CMSA.