Single-shot retinal image enhancement using untrained and pretrained neural networks priors integrated with analytical image priors.

Retinal images acquired using fundus cameras are often visually blurred due to imperfect imaging conditions, refractive medium turbidity, and motion blur. In addition, ocular diseases such as the presence of cataracts also result in blurred retinal images. The presence of blur in retinal fundus images reduces the effectiveness of the diagnosis process of an expert ophthalmologist or a computer-aided detection/diagnosis system. In this paper, we put forward a single-shot deep image prior (DIP)-based approach for retinal image enhancement. Unlike typical deep learning-based approaches, our method does not require any training data. Instead, our DIP-based method can learn the underlying image prior while using a single degraded image. To perform retinal image enhancement, we frame it as a layer decomposition problem and investigate the use of two well-known analytical priors, i.e., dark channel prior (DCP) and bright channel prior (BCP) for atmospheric light estimation. We show that both the untrained neural networks and the pretrained neural networks can be used to generate an enhanced image while using only a single degraded image. The proposed approach is time and memory-efficient, which makes the solution feasible for real-world resource-constrained environments. We evaluate our proposed framework quantitatively on five datasets using three widely used metrics and complement that with a subjective qualitative assessment of the enhancement by two expert ophthalmologists. For instance, our method has achieved significant performance for untrained CDIPs coupled with DCP in terms of average PSNR, SSIM, and BRISQUE values of 40.41, 0.97, and 34.2, respectively, and for untrained CDIPs coupled with BCP, it achieved average PSNR, SSIM, and BRISQUE values of 40.22, 0.98, and 36.38, respectively. Our extensive experimental comparison with several competitive baselines on public and non-public proprietary datasets validates the proposed ideas and framework.

Abstract-Retinal images acquired using fundus cameras are often visually blurred due to imperfect imaging conditions, refractive medium turbidity, and motion blur. In addition, ocular diseases such as the presence of cataract also result in blurred retinal images. The presence of blur in retinal fundus images reduces the effectiveness of the diagnosis process of an expert ophthalmologist or a computer-aided detection/diagnosis system. In this paper, we put forward a single-shot deep image prior (DIP)-based approach for retinal image enhancement. Unlike typical deep learning-based approaches, our method does not require any training data. Instead, our DIP-based method can learn the underlying image prior while using a single degraded image. To perform retinal image enhancement, we frame it as a layer decomposition problem and investigate the use of two well-known analytical priors, i.e., dark channel prior (DCP) and bright channel prior (BCP) for atmospheric light estimation. We show that both the untrained neural networks and the pretrained neural networks can be used to generate an enhanced image while using only a single degraded image. We evaluate our proposed framework quantitatively on five datasets using three widely used metrics and complement that with a subjective qualitative assessment of the enhancement by two expert ophthalmologists. We have compared our method with a recent state-of-the-art method cofe-Net using synthetically degraded retinal fundus images and show that our method outperforms the state-ofthe-art method and provides a gain of 1.23 and 1.4 in average PSNR and SSIM respectively. Our method also outperforms other works proposed in the literature, which have evaluated their performance on non-public proprietary datasets, on the basis of the reported results.
Index Terms-Retinal image enhancement, Retinal image generation, Single image analysis

I. INTRODUCTION
In ophthalmic clinical practice, retinal images are routinely acquired using fundus photography, which is used in the diagnosis and treatment of different retinal diseases such as diabetic retinopathy, hypertensive retinopathy, and age-related muscular degeneration [1] [2]. In addition, these images have been used for developing different computeraided detection/diagnosis systems for glaucoma detection [3], diabetic retinopathy classification [4], retinal arteries and veins classification [5], and blood vessel segmentation [6]. However, acquired retinal fundus images often contain blurriness due to different reasons such as dusty camera lenses, low-resolution camera, imperfect illumination, refractive medium turbidity, and incorrect focus [7]. In addition, the presence of cataract (an eye disease due to which natural eye lens becomes foggy) also results in hazy and blurred retinal images [8], where the blur/haze intensity increases with the cataract severity [9]. An event of an eye blink and occlusion due to eyelashes can also result in a blurry retinal image. The presence of blur significantly affects the sensitivity of the diagnosis process (either performed by a human expert or computer-aided system), particularly, for progressive ophthalmological diseases as it is difficult to analyze vascular structure in visually blurred images [2]. A study focused on the automatic analysis of morphometric properties of vasculature in retinal images revealed that more than 25% images were not suitable for automatic analysis due to their bad quality [10].
In the literature, different approaches have been proposed for retinal fundus image enhancement ranging from traditional image processing methods like image transformation [2]; contrast adjustment [11] and normalization [12]; to deep learning (DL) based methods [13]. DL-based methods operate in a supervised learning fashion and are typically data-driven. These methods require a substantial amount of training data, which is scarcely available in practice, as obtaining highquality representative medical data is very difficult, timeconsuming, and expensive. In fact, it is documented that a major challenge in retinal fundus image enhancement research is the unavailability of an application-specific dataset in which the blur/hazy images have their corresponding reference as ground truth (that can be used for evaluating the performance of enhancement methods) [14].
Various databases have been used in the retinal fundus image enhancement literature such as DRIMDB [15], HRF [16], and DR2 [17]. However, these datasets suffer from different issues, e.g., the images in these databases are assigned binary labels, i.e., either accept or reject. In a recent study [18], the authors have attempted to address these challenges by manually annotating a sample of a large collection of retinal fundus images that have been acquired for diabetic retinopathy grading (i.e., EyePACS dataset). The authors considered three-class annotations, i.e., good, usable, and reject. Furthermore, they used the labeled images to train a DL model to evaluate the quality of the retinal fundus images. However, the aforementioned datasets do not contain paired images (i.e., the blurred image and its corresponding reference (clean) image). Furthermore, the acquisition of a good quality reference image is another major challenge. The exact pixelto-pixel tight reference image cannot be realistically acquired in clinical settings. This is because to acquire reference (clean) images, we have to take the retinal fundus image of the patients after cataract surgery. However, it is quite challenging due to factors such as varied refractive properties of the replaced lenses, possibly changed field of view, and the position of the eye. We note that typically the datasets used for fundus image enhancement are the ones that have been developed for diabetic retinopathy or retinal blood vessels segmentation. These datasets lack ground truth reference images, which hinder their use for full-reference performance evaluation. of the image enhancement.
In this paper, unlike data-driven DL-based methods, we present a new approach for the dehazing of retinal fundus images that does not require any training data. Our approach uses only a single degraded image for recovering a true estimate of a clean image without requiring a reference image. Our approach is inspired by the recent successes of untrained neural network priors (UNNP) [19], [20], which uses a single degraded image to solve different inverse imaging problems like denoising, deblurring, and inpainting [21]. We formulate the problem of retinal fundus image enhancement as a highly ill-posed inverse problem in which we aim to find a clean retinal image from a degraded one without having any prior knowledge of the clean image. In this paper, we build upon our previous work [22], in which we demonstrated that the structure of a convolutional neural network (CNN) can be used as a regularizer to solve such inverse problem. This approach was shown to be quite effective because it does not require paired data (hazy and clean image) to train the DL models. Also, since our approach does not utilize any training data, it does not suffer from the problem of distribution shifts and provides good generalization. With reference to our previous work [22], the following are the specific extensions made in this paper: 1) We present the use of analytical image priors for atmospheric light estimation integrated with coupled deep image priors (CDIPs) networks for retinal fundus image enhancement. In contrast to our previous work [22], which used three DIPs with a separate DIP employed for atmospheric light estimation, our current framework only utilized two DIP networks and is therefore, computationally less expensive. 2) We show the effectiveness of using pre-training for our single-shot DL approach. Also, we demonstrate the effect of pre-training for cross dataset analysis thorough ablation studies. Pretraining achieves comparable performance 3-5 times faster than using untrained CDIPs. 3) We have modified our experimental setup and have employed an early stopping strategy to avoid overfitting. 4) We conduct an extensive performance evaluation of the proposed framework on five different datasets (both quantitative and qualitative) and report promising results. Extending our previous work [22], we have also used a no-reference metric for quantitative analysis (i.e., BRISQUE). Moreover, to highlight the clinical significance, we have performed a subjective evaluation done by two expert ophthalmologists. 5) We perform computational complexity analysis of our method and compare it with previous works. Paper Organization: The rest of the paper is organized as follows. The related work is presented in Section II. The proposed methodology is presented in Section III. The experimental setup and results are described in Section IV. Limitations and promising directions for future work are identified in Section V. Finally, we conclude the paper in Section VI.

II. RELATED WORK
In recent years, different DL-based methods have been proposed for retinal fundus image enhancement. In [23], the authors proposed to integrate a shadow removal layer with a U-Net model for dehazing of retinal fundus images. The proposed framework is trained in two steps, i.e., U-Net is trained first, and then Inception-v3 model is fine-tuned using the learned parameters of U-Net. The U-Net model learns to estimate the transmission map which results in minimum classification error. The proposed framework was evaluated for the diabetic retinopathy classification task, however, no quantitative metrics were used for the performance evaluation of retinal image enhancement.
Zhao et al. [13] proposed a generative adversarial network (GAN) based retinal image enhancement method that works in a weakly supervised fashion, i.e., it uses unpaired clean and blurry retinal images. Two GANs were trained for deblurring of retinal fundus images using a training set of 949 images with 4× data augmentation, i.e., a total of 3796 images were used for the training. In a similar study, You et al. [24] proposed to integrate a convolutional block attention module (CBAM) with a CycleGAN for enhancement of retinal fundus images. The method proposed in [24] was trained using unpaired clean and blurry retinal images, as it is quite challenging to acquire strictly clean and blurry paired retinal images for training a GAN. The proposed method was evaluated qualitatively and quantitatively in terms of average peak signal to noise ratio (PSNR) and structural similarity (SSIM). In [18], the authors first annotated a sample of a large collection of retinal images for image quality assessment purposes (they consider three classes, i.e., Good, Usable, and Reject). The data was originally annotated for the diabetic retinopathy classification task and then they employed a datadriven DL model to automatically assess the quality of retinal fundus images. In [25], authors first developed a synthetic model for introducing visual artifacts in retinal fundus images and then proposed a DL model to surpass these artifacts. In our previous work [22], we present a single-shot unsupervised three DUP-based framework and incorporated DCP loss into the overall loss. A comprehensive survey focused on the retinal fundus image quality enhancement is presented in [14].

III. METHODOLOGY
The key idea of the proposed method is image decomposition, where we decomposed the input degraded image into individual components and then we use the image formation model to get the enhanced image. In our proposed framework we employ multiple DIP networks to decompose the input image into its basic components. Unlike conventional DL, our method does not require any training data (paired or unpaired) and works by using only a single degraded image. This makes it quite useful for realistic medical applications, where paired data is usually not available and acquisition and annotation are costly. The proposed method is described below.

A. Image Decomposition using CDIPs
CDIPs leverages the well-known concept of image decomposition in computer vision, where the goal is to decompose an image into its basic layers. For instance, in image segmentation, the objective is to decompose the image into foreground and background layers; and in image dehazing, we are interested in decomposing the hazy image into a clear image and a haze map.
1) Illumination Compensation via CDIPs: In the literature, the illumination compensation problem is widely viewed as an haze removal problem. The model to describe a hazy image I(x) is given by [26] as the following: where t(x), J(x) and A represents transmission map (t-map), restored clean image (haze free), and atmospheric light, respectively. As described above, we formulated our problem as layer decomposition problem in which we aim to decompose a blurred/hazy image I(x) into its aforementioned three layers, i.e., clean image (o 1 (x) = J(x)), transmission map (o 2 = t(x)), and atmospheric light (o 3 = A), as shown in Fig.  1. Following our formulation, the Eq. 1 can be expressed aŝ The pipeline of the proposed CDIPs framework integrated with conventional priors is depicted in Fig. 1. As shown in the figure, DIP 1 and DIP 2 networks take randomly sampled uniform noise vector z, respectively and attempts to generate different layers o 1 (x) = J(x) and o 2 = t(x) using the input image I(x), respectively. As depicted in the Fig. 1, the atmospheric light is either estimated using dark channel prior (DCP) or bright channel prior (BCP), which have been widely used in haze removal problem (we will discuss the respective details in the coming section). The outputs of two DIP networks are mixed with the output of analytical prior using Eq. 2. The loss function optimized by the proposed CDIPs framework is given as: where L Rec denotes the reconstruction loss i.e., ||I−Î|| 2 , L Reg is regularization loss of DIP 2 that is defined as the norm of Laplacian which is minimized to enforce the estimated mask (o 2 ) to be smooth and finally, L overall represents the overall loss of CDIPs architecture.

B. Atmospheric Light Estimation
Atmospheric light is another component of an image that needs to be estimated along with the transmission map t(x) and it is used in a hazy image formation model to get the recovered/enhanced image (Eq. 1). In our previous work, we have estimated non-uniform atmospheric light as a separate layer using a third DIP network, while in this paper, we show that uniformly estimated atmospheric light also provides comparable results, despite being computationally less expensive that reduces time and space utilization. We assume that there is a uniform scattering of light that can be easily estimated using analytical priors such as dark channel prior (DCP) and bright channel prior (BCP). In the literature, most single image dehazing methods are based on this assumption [26]- [29]. In [23], authors assume A = 1 for retinal fundus image enhancement.
1) Atmospheric Light Estimation Using DCP: In contrast to our previous work [22], where we incorporated DCP loss into the overall loss of the CDIPs framework, in this paper, we use DCP to estimate the atmospheric light. In the literature, DCP has been widely used in single-image dehazing of natural images [26], [30]. It is based on the observation that in most of the local patches of haze-free color images, pixels have very low intensity in one of the color channels, which is called the dark channel. The dark channel prior for an image I(x) can be computed as: where I C (y) is the color channel of image I(x) and Ω(x) is local patch centered at x. In our previous work [22], we incorporated DCP loss into optimization loss of the CDIPs framework which is given as The computation of DCP loss is computationally very expensive [31], [32], as it involves the computation of patch-based DCP for the generated image at each iteration. To overcome this issue, we have not incorporated DCP loss into the overall loss of the CDIPs framework (as we did in our previous work), instead, we have used DCP directly for estimating the uniform atmospheric light. Uniform atmospheric light is estimated by selecting the top 0.1% darkest pixels from DCP as candidate pixels. The intensity of dark pixels is mainly contributed by atmospheric light in the haze images, therefore, these candidate pixels can directly provide an accurate estimate of haze transmission. Unlike, natural outdoor images that have sky regions, retinal fundus images do not have correlated patches as that of the estimated transmission map, which makes it feasible to recover haze-free retinal images.
2) Atmospheric Light Estimation using BCP: Unlike DCP, BCP is based on the observation that in most of the local patches of haze-free color images, pixels have very high intensity in one of the color channels, which is called the bright channel. BCP has been widely used in single image dehazing literature [33]. The bright channel for an image I(x) is computed as: where I C (y) is the color channel of image I(x) and Ω(x) is local patch centered at x. Similar to DCP, uniform atmospheric light is estimated by selecting top 0.1% brightest pixels from BCP as candidate pixels. After that, the maximum intensity value from candidate pixels in the hazy image is given as atmospheric light [30]. BCP have been shown quite effective for atmospheric light estimation [34] for the application of retinal image enhancement because the color intensities of retinal structures (such as vessels, optic disc, etc.) are inherently different from estimated atmospheric light (therefore, BCP prominently highlights such regions).

C. Untrained and Pretrained CDIPs
To capture useful information about the images, the parameters of DL model are tuned/learned from the training data. However, in the literature [35], it has been also shown that pretrained neural networks contain a significant amount of information about the task at hand, e.g., dehazing in our case. In this paper, we leverage the idea of pretarininig and investigate its effect in CDIPs application for retinal image enhancement. We show that the knowledge learned from one retinal fundus image can be used to effectively model other retinal fundus images in comparatively less amount of time while getting a comparable performance.
In Fig. 2, we demonstrate that how pretraining works for our problem using CDIPs. Starting from the pretrained parameters θ p0 loaded into CDIPs framework and randomly initialized input code vector z, we attempt to model probability distribution p(x|x 0 ) given that x is unknown clean image and x 0 is the degraded (hazy) image. The output at the first iteration contains randomization due to random code vector z and the optimizer iteratively optimizes neural network's parameters θ for the given input image using backpropagation. It has been shown in the literature [19], [36] that the choice of neural network architectures has a direct impact on the performance, e.g., we can design/handcraft a particular neural network architecture for modeling a specific image [21]. This serves as a solution space when modeling images using untrained neural networks priors. We found that the optimization process tends to destabilize for few images (evident from the high standard deviation (SD) in iterations reported in Table II that indicates that the optimization was sometimes stopped for a few images at very fewer iterations. This phenomenon highlights that the pretrained neural networks are not best for those particular images, otherwise pretraining works pretty well for most of the images. To overcome this issue, we rely on the early stopping.

A. Setup and Dataset Description
We perform extensive experiments on the five publicly available fundus image datasets that have different number of images and dimensions-namely, (i) DRIVE [37], (ii) STARE [38], (iii) Messidor [39], and (iv and v) DIARET DB calibration level 0 and 1 [40]. Unlike our previous work, we applied some pre-processing steps before feeding the input to the proposed framework. Firstly, all images from each dataset are centered cropped such that the field-of-view (FoV) is preserved. Secondly, each image was resized to a standard size of 512 × 512. We note here that DRIVE and STARE are the benchmark databases for retinal blood vessel segmentation while Messidor and DIARET DB (0 and 1) is the benchmark databases for diabetic retinopathy classification. We quantitatively evaluate the proposed framework using three widely known metrics, i.e., PSNR, SSIM, and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE). PSNR is a widely used metric and it measures the similarity between the two images. Higher PSNR represents better quality of the image. SSIM is a perception-based image quality metric that is used to measure the perceived change in structural information between two images. SSIM exploits inter-dependencies among spatially close pixels and a higher SSIM value is desirable. BRISQUE is a no-reference image quality metric that scores the perceptual quality of an input image based on its naturalness, where the lower value of BRISQUE is desirable. Finally, to highlight the clinical significance, we also perform the subjective qualitative assessment of the proposed method through two expert ophthalmologists.

B. Implementation Details
We use the U-Net as the generator network for generating the enhanced retinal fundus images [41]. U-Net is a deep CNN-based generative model that was specifically developed for biomedical image segmentation [41]. It has an hourglasslike architecture and skip-connections between encoder layers and decoder layers. The same model architecture was used in our previous work [22]. Each DIP uses one U-Net and overall framework is optimized using ADAM optimizer [42] with a learning rate of 0.004 and default values of β 1 = 0.9, β 2 = 0.999, and = 1e −8 . We perform different ablation studies using untrained U-Net and pretrained U-Net. In untrained experiments, all images from each dataset are processed for 3500 iterations and in pretraining experiments, every image is allowed to be processed for maximum 2500 iterations. Furthermore, we employ an early stopping strategy to avoid overfitting for all experiments. The number of trainable parameters in the proposed framework are approximately 1.15M.

C. Using Untrained CDIPs with DCP and BCP
As described in the earlier sections, we used analytical priors for estimating uniform atmospheric light in the retinal fundus image. In this section, we present the results of using DCP and BCP for atmospheric light estimation. We also report the standard deviation in addition to average values to provide an exact idea about the statistical distribution of achieved performance across different datasets which contain a different number of images. The comparative results of using CDIPs (i.e., one for getting recovered image and the other for estimating transmission map) and using DCP and BCP for atmospheric light estimation are presented in Table I. It can be observed from the table that the average performance of using BCP with CDIPs is comparatively higher than using DCP with CDIPs. This indicates that most of the fundus images have brighter pixels that are efficiently leveraged by the BCP. This trend can be seen for almost every dataset in terms of all metrics used except for the DRIVE dataset where DCP performance is higher, probably due to low intensity in images of DRIVE. Similarly, it can be seen that the performance of CDIPs coupled with analytical priors (DCP and BCP) is relatively low for blood vessel segmentation databases (DRIVE and STARE) as compared to diabetic retinopathy classification datasets (Messidor, DIARET DB 0 and 1). This is due to the fact that the image quality of segmentation databases is relatively higher (as they are purposely developed for retinal blood vessels segmentation) than diabetic retinopathy classification datasets.

D. Using Pretrained CDIPs with DCP and BCP
In this section, we highlight the benefit of pretraining for our proposed CDIPs framework. Pretraining works in two steps: (i) we randomly choose one image from each dataset and fit with our proposed CDIPs framework and upon its completion, we store the learned/optimized parameters of both DIP networks as pretrained parameters (this process is performed once for each dataset); and then (ii) we load these saved parameters when fitting other images. The results for pretrained CDIPs coupled with DCP and BCP are presented in Table II, which highlights that pretraining provides comparable performance for all datasets when modeled with the BCP-based CDIPs framework while requiring less number of iterations than untrained neural network-based approach. This is because in pretraining, domain knowledge is incorporated into the network that facilitates the optimization process. Whereas, when DCP is used with CDIPs, there is a certain difference between the performance in terms of different metrics, i.e., PSNR, SSIM, and BRISQUE. The key noticeable thing is the significantly reduced iterations required for optimizing the neural network parameters for a given input image. It is evident from Table II that pretraining provides comparable performance as compared with random parameters initialization and converges in fewer iterations.  Fig. 3. Improvement in PSNR (a) and SSIM (b) over iterations t for a given input image using randomly initialized network (no pretraining) vs. using pretrained network (initialized with parameters of a network that generated a fundus image of the same dataset). It is clear from the zoom-in version of the plots that pretraining allows the network to capture the unknown image statistics at very few iterations. Intermediate results generated by both methods at iteration t = 1, 10, 20, 50, 100, 200, and 300 are shown in (c). It can be seen that the pre-trained network was able to learn retinal blood vessels after 200 iterations while untrained network was not able to capture such details.
The difference in the performance of pretrained CDIPs is expected and it indicates that the pretrained neural network parameters being used are not optimal for certain images. Due to this reason, the optimization process for a few images stop after very fewer iterations that results in low-quality images. This observation is supported by the standard deviation values, e.g., the high standard deviation in average iterations used for optimization for certain datasets (e.g., STARE) indicates that a few images were early stopped at a relatively less number of iterations as compared to others. This situation can be avoided by employing an optimal pretraining strategy. We further note that relatively high average SSIM values (Table I and II)indicates that our method is able to reconstruct retinal images with high similarity between the structure of local patterns of the original image and enhanced image, which implies good quality preservation of the vascular structure.
E. Ablation Studies 1) Effect of Using Pretraining: The effect of using pretraining in our proposed CDIPs framework is depicted in Fig. 3. The figure demonstrates the intermediate results generated by our method at different iterations along with corresponding PSNR and SSIM, i.e., t = 1, 10, 20, 50, 100, 200, and 300 for both cases: (i) when the input image is modeled using randomly initialized network (no pretraining); and (ii) when the input image is modeled using pretrained network (initialized with parameters of a network that generated a fundus image of the same dataset). The difference in the performance of both cases is easily noticeable in zoom-in versions of the plots, as depicted in Fig. 3. Moreover, it can be seen that pretrained networks were able to capture/learn the low-level structural details of retinal blood vessels after 200 iterations while the untrained network was not able to capture such details. Randomization seen in the output of the first iteration is due to the use of random code vector z (please see Fig. 2). Note that only parameters of the encoder network in the U-Net model are fitted using a single degraded image, i.e., without data-driven training.
2) Pretraining for Cross Dataset Analysis: We have also investigated the effect of pretraining for cross dataset-based analysis, i.e., when pretrained parameters of an image from one dataset are used to fit the images of another dataset. The quantitative results for cross dataset analysis are summarized in Table III. We used three datasets for this analysis in which one is developed for retinal blood vessels segmentation (i.e., DRIVE) and the other two are diabetic retinopathy databases. It is evident from the table that cross dataset based pretraining works pretty well and it provides better results when the images are of similar nature, for instance, the performance on the DB1 dataset when pretrained with DB0 image is significant as compared to when images from DRIVE are fitted using pretrained parameters of DB0 image. This is because of the fact that both DB0 and DB1 are of the same category, i.e., diabetic retinopathy datasets. Pretraining using an image from DRIVE databased does not provide good performance in terms of average SSIM on images of DB1 database, which is expected as both databases are different. Similar to other pretraining experiments, in cross dataset pretraining, each image is processed for 2500 iterations and we used early stopping to avoid overfitting. Average iterations and iterations SD is also reported in Table III. It can be seen from the table that as compared to pretraining with DCP, pretraining with BCP was more prone to overfitting (evident from higher SD in the number of iterations). Because the domain knowledge (incorporated as pretrained parameters) increases the overall pigmentation of the fundus images enhanced using CDIPs integrated with BCP. Fig. 4 presents the results for an image from the STARE database that is fitted using pretrained parameters of the DIARET DB1 database. It is clear from the figure that the pretraining significantly boost the performance of our proposed framework in recovering faithful estimates of clean image. This is due to the fact that the untrained neural network is provided with domain knowledge (i.e., how a fundus image looks like in this case). Fig. 4. Improvement in PSNR and SSIM when an image from STARE database is fitted using pretrained parameters of an image from DIARET DB1 database. The proposed framework starts learning low-level image statistics, i.e., retinal blood vessels at fewer iterations.

F. Subjective Evaluation of Enhancement
The subjective assessment of enhancement quality was performed by two expert ophthalmologists (one of them is the co-author of the paper who has been actively involved in the project since the beginning). For this purpose, we carefully selected five images from each dataset to ensure that the selected images are a good representative of our best, good, and average results to ensure a fair evaluation. Note that we select only five images per dataset in the interest of the expert's time and efforts. We then asked the expert ophthalmologists to subjectively evaluate the quality of the images by keeping in mind the following important features of retinal images: (i) visibility of optic disc (F1); (ii) visibility of fovea (F2); (iii) visibility of retinal blood vessels including venules, arterioles, and capillaries including their branching and termination (F3); (iv) overall general fundus (F4); and (v) overall assessment on the quality of the enhanced image as compared to original (hazy) image (F5). Subjective evaluation scores were provided based on the comparative analysis of the original (without enhancement) and enhanced images, as we show both images to the expert ophthalmologists for grading images generated by our method (to ensure a fair comparison and evaluation).
To get the subjective assessment scores, we used a grading based on five scales: 5: Excellent Results; 4: Very Good Results; 3: Good Results; 2: Average Results; and 1: Bad Results. The average values of subjective evaluation metrics along with respective standard deviation scores are presented in Table IV. From the table, it is evident that the expert ophthalmologists rate the overall quality of the enhanced images to nearly excellent quality (nearly 5) for each dataset. Promising subjective assessment scores highlights the efficacy and clinical significance of our proposed method.
Below we describe the overall assessment of the expert ophthalmologists on the efficacy of our method, the ophthalmologists have the following specific observations about our proposed method.
1) It enhances the overall image clarity by ∼40% by enhancing contrast and defining the outlines of various fundal structures like the optic disc, vessels, and fovea. 2) In the enhanced images, it is very easy to diagnose and analyze the retinal abnormalities that include fibrosis, lesions, and hard exudates.

3) Blood vessel termination and branching points become
clearly visible in enhanced images as compared to the original images, which makes the detection of abnormal vessels and leakage points very easy. 4) In pigmented fundus images, the solution clarifies various lesions like degenerations, fibrosis, thinning, etc. 5) In light pigmented fundus images, it improves the visibility of various dark lesions like hemorrhage, melanomas, and naevi, etc. 6) The only noted limitation of the proposed method is slight obscuration of the outline of fundal structures with less than normal pigmentation such as seen in high myopes or Albinism.

G. Visual Evaluation of Enhancement
The visual comparison of generated (enhanced) images using different quality of blurry images and different methods proposed in this paper is shown in Fig. 5. These images have been selected from the subset images that we used for getting expert evaluation (we choose one image from each dataset) and we ensure that these images are representative of our best, good, and average results. It can be clearly seen from Fig. 5 that the proposed method can recover good quality images in which retinal blood vessels are prominently highlighted. Note that the clean image is recovered without any prior knowledge of the enhanced image. From Fig. 5, it can be seen that DCP favors some images and similarly, BCP also provides good performance in some images (e.g., the choroidal tissue defects near optic disc in the image of DIARET DB0 are prominently highlighted by BCP). This highlights that DCP leverages images with a darker background and BCP favors images having a bright background, i.e., DCP effectively handles the images having pixels with low intensities, which

H. Evaluation on Synthetic Data
The acquisition of reference (ground truth clean) images in retinal fundus image enhancement problem is very challenging due to the difficulties in obtaining exact pixel-to-pixel tight image before and after cataract surgery. To fill in this gap, Shen et al. [25] have presented a fundus image degradation model to synthetically introduce visual artifacts and haze/blurriness to retinal fundus image. In Table V, we perform a comparative analysis of our method in terms of average PSNR and SSIM on synthetically degraded DRIVE dataset ( [25]). To ensure a fair comparison, we have used a similar experimental setup as used in [25], where the synthetic retinal artifacts were introduced in the images of the DRIVE dataset. The original images were regarded as ground truth (reference images), and the synthetically degraded images are enhanced using our proposed CDIPs based framework. It is evident from Table V that our method significantly outperforms existing methods despite being single-shot and unsupervised. Note that we did not use any pretraining for our analysis on the synthetic dataset. In addition, we used the method of Shen et al. [25] for synthetically generating different degraded images by introducing different artifacts in retinal fundus images that include uneven illumination (haze), blur, and artifacts due to the dusty lens of the fundus camera. The performance of the proposed framework in terms of average PSNR and average SSIM on the DRIVE dataset using different degradation methods is summarized in Table VI.

I. Complexity Analysis
As described above, in our previous work [22], we incorporated DCP loss into the overall loss of three CDIPs network. The DCP loss involves the computation of small patches from the generated image at every iteration that inherently slows the overall framework. In this paper, we present an alternative approach, where we used only two DIP networks that are integrated with conventional image priors, i.e., DCP and BCP. These priors are used for estimating uniform atmospheric light  [45] Illumination map estimation. 14.10 0.703 Cheng et al. [46] Guided retinal image filtering. 14.97 0.648 Tian et al. [47] Global and local contrast adaptive enhancement. 15.42 0.721 Fu et al. [48] Weighted variational model. 15.56 0.722 He et al. [26] Dark channel prior (DCP). 15.78 0.559 Zuiderveld et al. [49] Contrast limited adaptive histogram equalization 15.93 0.740 Eilertsen et al. [50] Deep convolutional neural network (CNN). 19.01 0.755 Shen et al. [25] Multiple GANs are used in the proposed Cofe-Net framework.   in contrast to our prior work [22], where we used a separate DIP network for non-uniform atmospheric light estimation. In this section, we perform a complexity analysis of both approaches in terms of the percentage improvement in average time per image, average memory utilized per image, and the number of parameters saved. Table VII highlights that we have  27.27%, 20.54%, and 19.86% percent improvement in time, memory, and parameters respectively. These scores are based on an image size of 512 × 512 and when 3500 iterations were used to recover the input images. The image size and number of iterations have a direct impact on time and memory usage. Note that these scores are for the overall end-to-end framework that involves all steps from image loading to saving, etc. We further note that the sole optimization of CDIPs takes only 0.1 sec per iteration for our current approach.

V. LIMITATIONS AND FUTURE WORK
The key limitation of the CDIPs-based approach is the time required for reconstructing a recovered image. We see that increasing the number of iterations results is a gradual increase in performance, e.g., PSNR and SSIM. To address this time utilization issue, we have also investigated the use of pretraining, which has provided promising results for the majority of images. However, in some cases, pretraining results in overfitting that indicate the pretrained parameters being used were not optimal for specific images. To overcome this issue, we aim to develop an optimal pretraining strategy to fully uncover the potential of incorporating domain knowledge (i.e., through pretraining) into untrained neural networks in our future work. Also, the development of optimal early stopping criteria could be another possible solution towards reducing the time taken and avoiding overfitting. Another limitation of our approach is the slight obscuration of the outline of fundal structures that have been noted by the expert ophthalmologists while performing the subjective evaluation.

VI. CONCLUSIONS
In this paper, we have presented a unified single-shot deep learning (DL) framework for the enhancement of retinal fundus images. For this purpose, we have employed coupled untrained neural networks known as deep image priors (DIP) which are integrated with a conventional image prior, i.e., dark channel prior (DCP) and bright channel prior (BCP). The proposed work reconstructs the enhanced image using a single degraded image without the requirement of end-to-end data-driven training. We quantitatively evaluate the proposed approach on five different retinal fundus image datasets in terms of average peak signal to noise ratio (PSNR), structural similarity index (SSIM), and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) score. We also evaluate our method using a synthetic dataset (in which different image degradation were synthetically introduced in the original images prior to enhancement). We perform performance evaluation of our proposed approach with existing similar methods and our method outperforms them. To highlight the clinical significance, we incorporate the subjective assessment of the enhancement that is performed by two expert ophthalmologists. In addition to using untrained neural networks, we have also investigated pretraining that have provided promising results while reducing the computational cost and time.

ACKNOWLEDGMENT
We are very thankful to Dr. Kanwal Zareen Abbasi Associate Professor of Eye at HBS Medical and Dental College, Islamabad, Pakistan, for subjective qualitative assessment.