RAG-FW: A Hybrid Convolutional Framework for the Automated Extraction of Retinal Lesions and Lesion-Influenced Grading of Human Retinal Pathology

The identification of retinal lesions plays a vital role in accurately classifying and grading retinopathy. Many researchers have presented studies on optical coherence tomography (OCT) based retinal image analysis over the past. However, to the best of our knowledge, there is no framework yet available that can extract retinal lesions from multi-vendor OCT scans and utilize them for the intuitive severity grading of the human retina. To cater this lack, we propose a deep retinal analysis and grading framework (RAG-FW). RAG-FW is a hybrid convolutional framework that extracts multiple retinal lesions from OCT scans and utilizes them for lesion-influenced grading of retinopathy as per the clinical standards. RAG-FW has been rigorously tested on 43,613 scans from five highly complex publicly available datasets, containing multi-vendor scans, where it achieved the mean intersection-over-union score of 0.8055 for extracting the retinal lesions and the accuracy of 98.70% for the correct severity grading of retinopathy.

which fluid gets accumulated within the macula of the retina and its majorly caused due to diabetes, cataract surgeries, uveitis, retinal veins or arteries blockage [1]. CSR is a medical condition, mostly associated with youngsters, due to stress and it causes serous detachment underneath retina [2]. AMD is another retinal syndrome that is mostly found in elders, due to the formation of drusen. With the progression of the disease, abnormal blood vessels from the choroid intercept the retina and produce chorioretinal abnormalities such as scars and choroidal neovascular membranes (CNVM), leading to choroidal neovascularization (CNV). Although AMD alone does not cause blindness, it can cause severe impairments if not treated timely. There are many imaging techniques that can show the abnormal retinal symptoms. But the most widely used one is the optical coherence tomography (OCT). OCT imaging has the ability to present retinal abnormalities in early stages [3].

II. RELATED WORK
Many studies have been presented in the past for the automated extraction of retinal layers, retinal lesions and retinopathy diagnosis based on OCT imagery.

A. Retinal Layers and Lesions Segmentation
Extraction of retinal layers and lesions is one of the key steps in analyzing the retinal pathology. Many researchers have presented automated methods in this context [4]- [7]. Rashno et al. [8] developed a framework based on neutrosophic sets and graph algorithms to automatically segment IRF from OCT scans of DME subjects. Lee et al. proposed a modification of the UNet for detecting retinal fluid from ME affected OCT images [9]. Roy et al. [10] developed retinal layers and fluid detection framework which they evaluated on publicly available Duke dataset (named as Duke-II in this paper). Schlegl et al. [11] developed an auto-encoder to detect and quantify SRF and IRF, achieving the accuracy of 0.92 and 0.94, respectively. They evaluated their proposed method on 1,200 OCT volumes of retinal vein occlusion (RVO), DME and AMD subjects. The scans in their study were acquired through Cirrus and Spectralis machines [11]. Seebock et al. [12] exploited epistemic uncertainty based on Bayesian UNet for detecting retinal anomalies in DME, AMD, dry geographic atrophy and RVO pathologies. Identification of IRF from the AMD and DME affected scans was performed in [13] as well for which they extracted 312 distinct features and classified IRF through linear discriminant classifier (LDC), support vector machines (SVM) and Parzen window. Girish et al. [14] developed a framework based on a fully convolutional network for the automated segmentation of intra-retinal cysts from multi-vendor retinal OCT scans. Fang et al. [15] presented lesion-aware CNN model that pays attention to lesions while classifying the macular diseases. They developed a lesion detection network that generates a soft attention map which is further utilized by the classification network to accelerate the performance of macular diagnosis based on OCT imagery. Deep learning methods have shown remarkable performance for extracting retinal information. But, they require a lot of annotated data and training time as well to yield better results.

B. Retinopathy Diagnosis
Apart from extracting retinal layers and retinal lesions, many researchers have developed computer-aided diagnostic (CAD) systems to screen different retinal pathologies [16]- [19]. Kermany et al. [20] proposed a deep retinal screening system that was tested on 1,000 OCT scans and achieved an accuracy of 96.6%. They also made their dataset publicly available (we named it Zhang dataset in this article). Rong et al. [21] presented a surrogate-assisted and CNN based classification of retinal OCT scans on their local as well as on the Duke-III dataset [21] where they achieved an area under the curve (AUC) of 0.9856. Apart from this, many researchers have presented automated frameworks in the past [22]- [27] to extract retinal layers, retinal fluids as well as to classify retinal diseases using OCT images.
To the best of our knowledge, there is no framework that can extract retinal lesions such as IRF, SRF, HE, drusen and chorioretinal abnormalities (CA) like fibrotic scars and CNVM from multi-vendor OCT scans and use them for the accurate grading of ME, AMD and CSR pathologies as per the clinical standards. Here, it should be noted that classification and grading are not the same things. Grading is the next step after classification to measure the severity (stage) of the disease and it is based on the defined set of clinical standards. Retinal lesions play a significant role in accurately classifying retinal diseases [15]. In [11], retinal fluid recognition has been carried out for the AMD, DME and RVO pathologies. Moreover, the significance of extracting lesion biomarkers for retinal diagnosis is also highlighted in [12], [13] where [12] presented epistemic uncertainty of lesions without discriminating them and [13] identified IRF for AMD and DME cases. Similarly, the intra-retinal cysts from retinal OCT images have been automatically segmented in [14]. However, none of them have utilized the extracted lesions for severity grading retinal diseases as per the clinical standards. Also, the challenge of proper extraction and recognition of lesions such as IRF, SRF, HE, drusen and CA from multi-vendor OCT scans has not been addressed yet which is crucial for grading the retinal pathologies, especially the dry and wet AMD.

C. Contributions
We present a unique retinal analysis and grading framework (RAG-FW), that extracts the retinal lesions to give lesioninfluenced grading of various retinal pathologies. In addition to this, RAG-FW has the capacity to overlook inter-scanner variations, scan artifacts and vendor annotations to produce highly discriminating features for both pixel-level lesions segmentation and scan level retinopathy classification. Furthermore, RAG-FW uses a novel run-length strategy that generates an eye-level diagnosis by checking the longest inter-scan connectivity of the graded pathology. We rigorously tested RAG-FW on 43,613 multi-vendor OCT scans for extracting retinal lesions (such as IRF, SRF, HE, drusen and CA), and also for the classification and grading of retinopathy, where it outperformed state-of-theart solutions by achieving 14.15% improvements in extracting retinal fluids on Duke-II, 2.02%, and 1.24% improvement in classifying retinopathy on Zhang and BIOMISA datasets, respectively. To the best of our knowledge, RAG-FW is the only framework to date that can robustly extract such a variety of retinal lesions in one go from OCT scans irrespective of the inter-scanner variations and utilizes them not only for the accurate classification of retinopathy but also for the severity grading of retinopathy. Moreover, the generalization capabilities of RAG-FW have been further tested through extensive crossdatasets validations on five highly complex datasets where it achieved a remarkable performance as evident from Table VII Fig. 2. Block diagram of the proposed RAG framework (A) preprocessing stage to extract retina from the candidate scan, (B) proposed RAG framework that takes preprocessed scan as an input and uses RAG-Net for extracting retinal lesions and classifying retinopathy. Furthermore, RAG-FW uses the extracted lesion information for the lesion-influenced grading of retinopathy. and IX. The rest of the paper is organized as follows: Section III presents the proposed implementation. Section IV describes the experimental setup followed by the results in Section V. Section VI discusses RAG-FW in detail and Section VII concludes the paper.

III. PROPOSED IMPLEMENTATION
RAG-FW employs a hybrid convolutional network (RAG-Net) which contains dedicated units for lesions segmentation and retinopathy classification. Since both of these tasks are interlinked, requiring similar features, so a single CNN encoder is shared among them for feature extraction. Moreover, they have been trained jointly in which the weights adjusted by the segmentation unit are utilized by the classification unit in a fine-tuning mode. The block diagram of the proposed framework is shown in Fig. 2. First of all, the input scan is preprocessed through structure tensors [28] to crop the retina from the candidate scan and removes the background information, vendor annotations and acquisition artifacts, etc. Afterwards, the scan is passed to the segmentation unit which extracts the potential lesions from it. Also, the classification unit screens the preprocessed scan against retinopathy. The classified scan is then further graded based upon the extracted lesions following the defined set of clinical rules as shown in Table I. This process is repeated for all the scans within the OCT volume and then the eye or volume level grading is generated by checking the longest scan connectivity. RAG-FW has been tested on 43,613 scans from different publicly available datasets acquired through different  [30], Age-Related Eye Disease Study (AREDS) [31] and [32] but only considering findings from OCT imagery. It is emphasized here that the patient history and findings from the other modalities (such as FP or FFA etc.) cannot be ignored.
OCT machinery. The detailed discussion on each module is presented below:

A. Preprocessing
The OCT scan goes first into a preprocessing stage. The prime purpose of the preprocessing stage is to isolate retina from the candidate scan. This cropping considerably improves the performance of CNN architectures in extracting lesions, especially from the noisy scans. The preprocessing within the proposed framework is performed through structure tensors [28]. Structure tensors highlight the predominant orientations in the image gradients and tell the degree to which they are coherent within the specified neighborhood of a point. Let S T be the structure tensor at pixel-level derived from the image gradients ∇ X and ∇ Y as expressed below: where ϕ denotes the Gaussian filter of finite length M that is convolved with the product of image gradients to remove the noisy outliers. For each pixel within the candidate scan, we get a 2 × 2 symmetric structure tensor matrix as shown in Eq. (1) and repeating the same process for the whole image, we obtain four tensors representing image transitions at orthogonal orientations [28]. Afterwards, the tensor which represents the maximum retinal information is automatically selected by analyzing the maximum eigenvalue strength of each tensor [24]. Moreover, to isolate retina, the retinal mask is generated using the inner limiting membrane (ILM) and choroidal boundary (which are extracted from the selected tensor by iteratively searching for the first and last transition in each column of the scan). The retinal mask is then multiplied with the input scan to isolate retina as shown in Fig. 3. The extracted retinal scan is then passed to the RAG-FW for further analysis.

B. Proposed RAG Framework
RAG-FW iteratively processes each scan within the OCT volume and uses RAG-Net for the extraction of retinal lesions and for the classification of retinopathy. The segmentation unit in RAG-Net is based on an encoder-decoder topology that extracts lesions from the candidate retina. The classification unit also uses the same encoder end for feature extraction and it contains additional layers to classify the candidate scan against retinopathy. The classified scan is further graded based upon the extracted lesion information using the defined set of clinical rules. After grading all the scans, RAG-FW generates the eye level grading by checking the longest connected run-length sequence within all the scans. The detailed description of RAG-Net architecture is presented below.
1) RAG-Net for Retinal Lesions Extraction: RAG-Net segmentation unit is a convolutional encoder-decoder architecture that decomposes the candidate scan while preserving lesion areas based upon the trained kernel weights. Afterwards, the decomposed scan is up-sampled on the original scale in which only lesions are retained. At the decoder side, the un-pooling is achieved through transposed convolution instead of separate convolution and up-sampling. Furthermore, unlike conventional encoder-decoder architecture, the feature maps at each encoder depth (containing finer lesion features) are combined through convolution and addition layers, and they are added with the respective decoder part as well. This operation requires less memory space and gives a better spatial representation of retinal lesions as compared to conventional feature maps concatenation process. Also, rather than computing edge-based features, the segmentation unit in the proposed framework compute contours based features to preserves the lesion areas which greatly helps in retaining the geometrical shape of the lesion areas during scan decomposition.
2) RAG-Net Segmentation Unit Encoder: Each depth of the RAG segmentation unit contains convolution layers, rectified linear units (ReLU) and batch normalization layers. The lesion areas within the input scan are preserved because of the convolution weights which are adjusted during the training phase. Afterwards, the weights are normalized, and the negative values are truncated which ensures that only those pixels which correspond to retinal lesions are retained in the feature map through the ReLU activation function. In addition to this, the feature maps at each stage of the segmentation unit are reduced through max-pooling and convolutions. Moreover, skip connections are employed via addition at each depth of the encoder to preserve the best lesion features. These feature kernels are further added together to obtain a single best representation of retinal lesions which is then directly passed to the respective decoder end. At the end of the encoder, RAG-Net employs four average pooling layers (inspired by the PSPNet [29]) to preserve lesion's contextual information during the scan decomposition. The pooled results are then resized and are concatenated together. Afterwards, the concatenated features are passed to the decoder.
3) RAG-Net Segmentation Unit Decoder: After retaining lesions in the encoder block, the decoder block rescales them back to the original resolution. At each depth of the decoder, there is a transposed convolution layer which up-samples the convolution output by the factor of 2. Apart from this, the finer lesion features from the encoder end are also added with the respective decoder to effectively retain lesions geometrical shape. At the end of the decoder is the softmax layer that computes class probabilities for each pixel and assigns it to the class which has a maximum probability. The segmented lesion map is then binarized and post-processed to remove small blobs through morphological enhancements.

4) RAG-Net for Retinopathy Classification:
The proposed framework also employs a classification unit to classify candidate scans against different retinal pathologies using two additional layers as shown in Table II. The classification unit  TABLE II ARCHITECTURAL DETAILS AND HYPER-PARAMETERS OF RAG-NET * detailed summary is also presented in the codebase package.
Bold ones are only present in segmentation unit while the underlined ones are only present in the classification unit.
is built using the same encoder end of the RAG-Net that has been trained for lesion segmentation. The weights of the encoder end are transferred to the classification unit where they are fine-tuned for classification purposes. It is worth noting here that the weights updated for lesions segmentation highly converge for the retinopathy classification as well since both tasks are interlinked. In fact, the authors in [15] passed the lesion maps along with the retinal scans to guide the CNN model for more accurate classification. However, here instead of passing the lesion map separately or using two totally independent models for segmentation and classification, we merged them together and used a joint training strategy in which the segmentation unit is trained first, and the classification unit utilizes its weights and fine-tune its layers accordingly. This scheme greatly reduces the overheads of managing two separate models and the related training and reproducibility issues. The architectural details and hyper-parameters of RAG-Net are depicted in Table II where we can observe that the RAG-Net architecture has 62,352,188 parameters in total from which 62,240,828 parameters are learnable and 111,360 parameters are non-learnable. It should be noted here that the number of layers and their sizes have been finalized after rigorous experimentations on different publicly available datasets. Moreover, both units employ cross-entropy loss Ce L that measures the degree of dissimilarity between the actual output distribution and the predicted output distribution. Ce L is computed through Eq. (2): where B s is the number of samples, M represents the number of classes, T i,j is the indicator that whether sample i belongs to class j or not, P i,j denotes the predicted probability of the ith sample for the jth class.

5) RAG-FW for Retinopathy Grading:
After extracting the retinal lesions and classifying the candidate scan, it is graded accordingly. The grading strategy is based on a set of defined clinical standards as mentioned in Table I. According to ETDRS [30], ME is graded as clinically significant (CSME) if 1) retinal thickening is observed at or within 500 μm from the center of macula or 2) hard exudates associated with retinal thickening are observed at or within 500 μm from the center of macula or 3) retinal thickening zone(s) of one-disc area (or more) are observed of which at least one part is within one-disc diameter from the center of macula. However, we note here that ETDRS developed this definition of CSME and non-CSME more than three decades ago (before the usage of OCT imagery for retinal examination) [33], while OCT images only show clinically significant ME cases even in early stages when they are needed to be dealt with [33]. Therefore, instead of grading CSME and non-CSME, we have considered grading ME as DME or pseudophakic cystoid macular edema (P-CME) in our study as it has more clinical relevance.
A study was presented in [32] in which the authors have concluded that OCT imagery can solely classify DME and P-CME. Using their study as one of the guidelines, we also made RAG-FW to automatically distinguish between DME and P-CME as shown in Table II. This is, of course, one perspective of differentiating DME and P-CME cases as the patient history of diabetes and findings from other modalities cannot be fully ignored. Moreover, according to AREDS [31], AMD is graded as dry if only drusen and retinal pigment epithelium (RPE) atrophic profiles are observed while it is graded as wet if the retinal fluids or other chorioretinal abnormalities such as fibrotic scars or CNVM are observed. It should be noted here that since the presence of either CNVM or fibrotic scars indicates the wet form of AMD. So, rather than identifying them individually, we collectively recognized them as CA in the proposed study.
Apart from this, CSR is also clinically graded as acute, acutepersistent or chronic [34]. Acute CSR occurs due to the accumulation of SRF (typically dome-shaped). Acute CSR becomes persistent when the SRF levels are not decreased within three months of their appearance. Some cases are resolved automatically while other turn into long-lasting chronic CSR (within three to six months of SRF appearance) where RPE atrophy, fibrosis and neovascular membrane (leading to CNV) can be observed [34]. It is worth noting here that the proposed RAG-FW can differentiate between DME and P-CME cases. Also, it can further grade AMD as exudative (wet) or non-exudative (dry) and CSR as acute or chronic by checking the presence of retinal lesions in the classified pathology. For example, if the CSR classified subject has CA along with SRF, then it is graded as chronic CSR and if it only has SRF then it is graded as acute CSR.
6) Eye-Level Grading: After grading all the OCT brightness scans (B-scans) within the OCT volume, the proposed framework uses a simple yet effective mechanism for generating the eye or volume level grading. Since any disease-specific retinal abnormalities are presented in a consecutive manner within the OCT volume. So, the eye-level grading is generated by checking the longest grading connectivity across all the scans of the OCT   volume as shown in Fig. 4. Since the longest connectivity in Fig. 4 is of Acute CSR, so the complete volume (or an eye) will be graded as Acute CSR positive.

IV. EXPERIMENTAL SETUP
We conducted a series of experiments that aim to assess the performance of RAG-FW in 1) extracting and identifying retinal lesions, 2) classifying different retinal pathologies, 3) grading the retinopathy subjects as per clinical standards on five publicly available datasets.

A. Data
All the datasets which have been used for training and validating RAG-FW are presented in Table III. Duke and Zhang datasets are acquired through Spectralis, Heidelberg Inc. machine while BIOMISA dataset is acquired through Topcon 3D OCT 2000 series machine. Duke datasets contain normal as well as DME and AMD affected scans, BIOMISA dataset contains scans with healthy, AMD, ME and CSR pathology while Zhang dataset contains scans of drusen (dry AMD), DME, CNV (wet AMD) and healthy subjects. Furthermore, all these datasets have been annotated by expert clinicians.

B. Training Details
The training for both segmentation and classification unit is conducted for 20 epochs using the adaptive learning rate method (ADADELTA) [35] as an optimizer with a default initial learning rate of one and a decay factor of 0.95 [35]. Each epoch is composed of 512 iterations where the validation is performed after each epoch during the training phase. For validation, we took 1,000 scans from the training dataset (100 scans from Duke-I, 100 scans from Duke-II, 400 scans from Duke-III, 200 scans from BIOMISA and 200 scans from Zhang dataset) along with their ground truths. Moreover, both units are implemented on the Anaconda platform with Python 3.7.4 and Keras APIs on a machine with Intel Core i5 8400 processor operating at 2.8 GHz clock frequency, 16 GB RAM and NVIDIA RTX 2080 GPU. The training performance is shown in Fig. 5.

C. Evaluation Metrics
We used a variety of metrics for thoroughly evaluating RAG-FW and also to compare its performance with state-of-the-art solutions. We started with intersection-over-union (IoU) and dice coefficient (D Coff ) for measuring retinal lesions extraction performance. IoU is computed through: where T P , F P and F N represents true positives, false positives, and false negatives, respectively. Afterwards, mean IoU is computed by taking the average of IoU scores. Furthermore, using IoU scores, we computed D Coff through: The mean D Coff is computed by taking the average of D Coff scores. We also computed receiver operator characteristics (ROC) curves to further validate the performance of RAG-FW for extracting retinal lesions and for retinopathy classification. ROC curves are computed by varying the classification threshold from 0 to 1 in the step of 0.001. In addition to this, the performance of the proposed framework for diagnosing and grading retinopathy is assessed by generating confusion matrices and measuring accuracy (A C ), sensitivity (T PR ), specificity (T NR ), precision (P PV ) and F 1 score through Eq. (5-9) P PV = T P T P + F P (8) where T N denotes the true negatives. Apart from this, we also presented different qualitative evaluations of RAG-FW to demonstrate its performance in extracting and recognizing retinal lesions as well as classifying and grading retinopathy.

V. RESULTS
RAG-FW has been validated on five publicly available datasets for lesions segmentation and lesion-influenced grading of retinopathy. In addition to this, the performance of RAG-FW for lesions extraction has been thoroughly compared with popular frameworks like PSPNet [29], SegNet [36], UNet [37], FCN (8s and 32s) [38] where preprocessing stage has been added in all the frameworks for a fair comparison. To increase the readability, we first presented the evaluations of RAG-FW with respect to each dataset separately and then also on the combination of all the datasets as shown below:

A. Evaluations on Zhang Dataset
Zhang dataset is one of the largest OCT datasets to date. It has been made publicly available by [20] primarily for the retinopathy classification and only contains ground truths to measure the retinopathy diagnostic performance. For a more in-depth evaluation of RAG-FW for extracting retinal lesions, we got the Zhang dataset annotated from the expert clinicians in the Armed Forces Institute of Ophthalmology (AFIO), Rawalpindi Pakistan for each retinal lesion. First of all, we thoroughly compared RAG-FW with [20] on the Zhang dataset for classifying retinopathy.
The classification performance of RAG-FW and [20] are shown in Table IV where it can be seen that RAG-FW achieved an accuracy of 98.6% (2.02% better than [20]) and F 1 score of 99.06% (0.46% better than [20]) for retinopathy classification. RAG-FW achieved better results than [20] because the weights used by the classification unit are accelerated by the segmentation unit which pays attention to the lesion features. So, rather than relying solely on image representation of each pathology, the classification unit screens retinopathy by considering the lesion regions as well. Furthermore, RAG-FW performance for retinal lesions extraction on the Zhang dataset is shown in Table V. Here, the metric of evaluation is mean IoU and we can see that RAG-FW outperformed other networks by achieving the mean IoU score of 0.7852, 4.58% better than PSPNet and 54.17% better than FCN-32. RAG-FW comes second for extracting IRF and SRF with a slight difference of 0.32% (PSPNet) and 1.62% (SegNet), respectively. But, if we compare RAG-FW performance with PSPNet for CA and HE, then we can see that RAG-FW extracted these lesions with a neat gap of 4.25% and 14.28%, respectively.
Moreover, the Zhang dataset contains many complex scans where the retinal lesion extraction is quite challenging as evident from Fig. 6(C), (D) and (E) in which multiple lesions such as IRF and CNVM can be noticed. However, RAG-FW not only extracted the retinal lesions proficiently but was able to judge them  correctly as compared to its competitors even when they have similar texture in the scan. This is because RAG-FW preserves the scan contextual information during lesion segmentation and adds the best features together during the encoding process. More scans depicting the lesion extraction performance of RAG-FW and its comparison with other segmentation networks are presented in the codebase package.

B. Evaluations on Duke-I Dataset
Duke-I contains a total of 38,400 scans depicting dry AMD and the normal pathology. In order to evaluate the lesions extraction performance of RAG-FW on Duke-I, we got it annotated from AFIO, Rawalpindi since it does not contain lesions annotations. The lesions extraction performance of RAG-FW on Duke-I can also be observed in Table V where it can be seen that RAG-FW achieved the mean IoU score of 0.8193 for correctly extracting and identifying drusen (the only lesion present in Duke-I). Comparing the performance of RAG-FW with other segmentation frameworks, we can see that RAG-FW achieved 3.18% better performance than PSPNet and 69.02% better performance than FCN-32. We have also utilized scans from the Duke-I dataset along with other Duke datasets for evaluating the performance of RAG-FW for retinopathy classification and grading. The detailed discussion about this is presented in Section V (F).

C. Evaluations on Duke-II Dataset
Duke-II is another publicly available dataset from Vision and Image Processing (VIP) lab, Duke University, USA which we have considered in the evaluation of RAG-FW. The dataset is introduced in [6] and contains 610 OCT scans from 10 DME subjects. RAG-FW has been applied to Duke-II for the extraction and identification of retinal lesions as well as for the classification and grading of retinopathy. Table VI shows the comparison of the proposed framework with [6], [8] and [22] in terms of mean D Coff and mean IoU. For fairness, the comparison shown in Table VI is based on original ground truths provided in the Duke-II dataset and markings from both clinicians have been considered. It can be seen from Table VI that RAG-FW produced better fluid extraction results than [6], [8] and [22] on Duke-II and achieved the mean D Coff and mean IoU score of 0.664 and 0.497, respectively. Here, we want to mention that [10] also extracted the retinal fluid from the Duke-II dataset and achieved the mean D Coff of 0.77. But, we haven't included this in Table VI because [10] achieved this score by considering the markings of only one clinician and there is significant variability in markings from both clinicians [6]. Hence, its comparison with RAG-FW would have been unfair. Furthermore, we have compared the proposed framework with [7] and [12] as well for fluid extraction but since these frameworks are tested on their local in-house datasets. So, the comparison with [7] and [12] is indeed indirect. Also, it should be noted here that IoU is a more strict measure as compared to D Coff and from Table VI, it is evident that RAG-FW achieved 2.56% better fluid extraction results than [22] (the 2nd best) and 14.15% better results than [6] in terms of mean D Coff (or 3.82% better results than [22] and 19.91% better results than [6] in terms of mean IoU). Apart from this, we have compared the performance of RAG-FW with the popular segmentation networks as well for the extraction of IRF, SRF and HE as shown in Table V. Here, the ground truths for IRF, SRF and HE have been obtained through AFIO, Rawalpindi and all the segmentation networks have been evaluated on these ground truths. It can be seen that the RAG-FW achieved the mean IoU score of 0.7826 which is 4.44% better than PSPNet and 78.77% better than FCN-32.

D. Evaluations on Duke-III Dataset
Duke-III is the third publicly available dataset from the VIP lab which we used in this study. The dataset is introduced in [19] for the classification of AMD, DME, and normal subjects. Duke-III contains OCT scans from 15 AMD subjects, 15 DME subjects, and 15 healthy subjects. The classification performance of RAG-FW on the Duke-III dataset is presented in Table IV. Here, it should be noted that the performance for all methods is measured subject-wise not the scan-wise as per the dataset standard where the proposed framework and [21] correctly classified all the 45 subjects and [19] classified 43 out of 45 subjects correctly. In addition to this, we have evaluated RAG-FW for the extraction of different retinal lesions as shown in Table IV. Here the ground truths for retinal lesions are obtained through AFIO, Rawalpindi Pakistan as Duke-III does not originally contain annotations for the retinal lesions. Results reported in Table V show that the best performance for extracting drusen and SRF in terms of IoU is achieved by the PSPNet and SegNet, respectively. However, RAG-FW is able to achieve the best mean IoU score of 0.7573 for extracting retinal lesions which is 3.07% better than PSPNet and 64.75% better than FCN-32. Overall, RAG-FW achieved the best performance because it outperformed its competitors in extracting IRF by 2.85% and HE by 10.54% respectively.

E. Evaluations on BIOMISA Dataset
The last dataset on which we have evaluated RAG-FW is the BIOMISA dataset in which the scans are acquired through Topcon 3D OCT 2000 machine. It should be noted here that BIOMISA dataset only contains pseudo-colored OCT images while other datasets contain original grayscale ones.
To achieve better performance, we maintained consistency with other datasets by using the green channel of BIOMISA OCT scans only as it contains the maximum amount of retinal information. Moreover, the BIOMISA dataset has been made publicly available in [40]. In the results depicted in Table V, we can see that RAG-FW achieved the mean IoU score of 0.7918 on the BIOMISA dataset for extracting retinal lesions, which is 5.11% better than the PSPNet. Furthermore, we report a qualitative comparison in Fig. 6(B), (H) and (K) where it can be observed that RAG-FW produced better results than other networks. If we look at Fig. 6(K), we can see that RAG-FW achieved better extraction of IRF and HE as compared to its competitors without confusing between different lesion areas as done by PSPNet, UNet, and FCN-8. We note, however, the best performance in Fig. 6(H) showing normal pathology is achieved by SegNet and FCN-8 where RAG-FW confused a tiny region of hyper-reflectivity as IRF. Moreover, Table VI presents the performance comparison of RAG-FW with [22] for retinal fluid extraction where it can be observed that RAG-FW achieved the mean D Coff of 0.934 and mean IoU of 0.876 which is 2.99% better than [22] in term of mean D Coff (and 5.47% better than [22] in terms of mean IoU). Apart from this, Table IV shows the performance of RAG-FW for classifying retinopathy where it can be seen that the proposed framework achieved the accuracy and F 1 score of 99.01% and 99.37%, respectively for classifying retinopathy.
Comparing the classification performance of RAG-FW with its competitors in Table IV, we can see that it produces 1.24% better results than [24] in terms of accuracy, 2.66% better results than [23] in terms of specificity, 2.10% better results than [23] in terms of precision and 1.02% better results than [24] in terms of F 1 score. However, it produces the second-best performance in terms of sensitivity where it lags only 0.91% less than [24].

F. Evaluations on Combined Datasets
In another series of experiments, we have evaluated RAG-FW on the combination of all datasets for extracting retinal lesions, classifying and grading retinopathy. The combined datasets contain 43,613 multi-vendor OCT scans for evaluation purposes. The performance of RAG-FW for extracting different retinal lesions on combined datasets is presented in Table VII in terms of mean IoU where we can see that the proposed framework achieved the mean IoU score of 0.8055 ± 0.1009. Comparing the performance of RAG-FW with others, we can see that RAG-FW has outperformed PSPNet by 3.62%. To further evaluate the performance of RAG-FW for retinal lesions extraction, we computed ROC curves as shown in Fig. 7(A). It can be observed from Fig. 7(A) that the minimum AUC score is achieved for recognizing IRF vs SRF i.e. 0.9758. But considering the fact that both IRF and SRF have highly similar image features, the performance of RAG-FW in distinguishing these regions is quite impressive. Furthermore, RAG-FW has efficiently discriminated between IRF/ SRF and HE regions by achieving the AUC score of 0.9893 and 0.9833, respectively.   Table IV, where it can be observed that the RAG-FW achieved the accuracy and F 1 score of 99.32% and 99.52%, respectively for classifying retinopathy. For further validation, we computed ROC curves as shown in Fig. 7(B). It can be observed here that RAG-FW achieved the minimum AUC score of 0.9323 for diagnosing ME as compared to other pathologies while the best performance is achieved for diagnosing normal vs abnormal retinal pathologies i.e. the AUC score of 0.9417.
Apart from classifying retinopathy, RAG-FW can also grade it as per the clinical standards. The retinopathy grading performance of RAG-FW can be seen in Fig. 9(A) where it can   be observed that RAG-FW achieved an accuracy of 98.70%. In addition to this, we have performed another experiment in which we have only used a classification unit for the direct grading of retinopathy instead of classifying the scans first and grading them based upon the extracted lesions. The performance of RAG-FW for direct grading is reported in Fig. 9(B) where it can be observed that the classification has confused a lot of DME and P-CME affected scans as well as dry AMD and wet AMD affected scans. This result evidences that the extraction of retinal lesions is crucial for accurate retinopathy grading. It also concurs with the relevance of retinal lesions extraction for the retinopathy classification reported in [15].
In another series of experiments, we assessed the generalization capacity of the proposed framework through cross-datasets For example, Duke datasets do not contain Wet AMD (CNV) scans contrary to the Zhang dataset. However, we didn't filter the scans containing CA regions in the experiment as CA is not present in Duke datasets. Have these scans been filtered, the performance of RAG-FW would have been even better. For the classification, we only excluded the evaluation for the CSR since the scans depicting this disease was only present in the BIOMISA dataset. In the results, depicted in Table IX, we can notice that the RAG-FW still preserves a high level of accuracy for the Duke and Zhang pairs as they have been acquired with the same machine. We have though an exception at F 1 and P PV scores (which are underlined) for the ME class when Duke datasets are used for testing. This is due to the imbalanced ME class in Duke-II (only 305 samples). For the other pair of datasets, we notice a decrease in the performance which is expected due to the differences in the scanner specifications. Note that because of space limitation we could not report the confusion matrices of these validations, but they can be consulted at RAG-FW code documentation. 1  TABLE IX  CLASSIFICATION PERFORMANCE WHEN TRAINED ON ONE DATASET AND  TESTED ON ANOTHER (TRAINING DATASET → TESTING DATASET) Bold indicates the best classification scores across all pathologies and the low scores due to class imbalance situation are underlined. In the last experimentation, we assessed the ablative aspects of the RAG-FW related to the lesion extraction, namely the contour-based scheme versus edge-based methods. The results reported in Table X confirms the superiority of the contourbased approach in preserving the geometrical properties of the lesion.

VI. DISCUSSION
This paper presents a hybrid convolutional framework named as RAG-FW. RAG-FW employs RAG-Net that contains a segmentation and a classification unit for the extraction of retinal lesions and lesion-influenced grading of retinal diseases. Retinal lesions play a vital role in analyzing and measuring the severity of retinopathy. To the best of our knowledge, this paper presents the first framework that not only recognizes the retinal lesions but also uses them for the severity grading of the human retina from multi-vendor OCT scans. Apart from this, RAG-FW has been extensively tested on 43,613 retinal OCT scans from different publicly available datasets and it is thoroughly compared with existing state-of-the-art solutions against different metrics. The training time for the RAG-FW is around 5 hours and 10 minutes whereas it takes around 21 seconds on average for the lesions extraction and grading a complete OCT volume against retinopathy. Lesions such as IRF and SRF have also been recognized by [11], [13] and [14] but they haven't utilized these biomarkers for grading the candidate retina (also [13] and [14] identified IRF only). Authors in [15] designed lesion-aware CNN model for the accurate classification of retinopathy but they haven't tested it on multi-vendor OCT scans nor they have performed lesion-influenced grading. We have extensively tested the proposed framework on each dataset separately as well as on their combination. Furthermore, we have compared it with existing state-of-the-art solutions for retinal lesions extraction and retinopathy classification where the proposed framework significantly outperforms them in various metrics as discussed in the results section. Moreover, the preprocessing stage significantly improves the lesion extraction results for both the proposed framework as well as for the pre-trained segmentation models as evident from Table XI. Since the scans are grayscale in nature (except for BIOMISA scans, which were intentionally processed as grayscale to make them compatible with other datasets), some of the background regions especially near vendor annotations look quite similar to lesions areas such as fluid (especially in BIOMISA scans). Therefore, without the preprocessing stage, all the segmentation models can easily misclassify those regions. We also molded RAG-FW to grade retinopathy directly instead of classifying it first and then grading it as shown in Fig. 9(B). However, there have been several misclassifications between P-CME and DME, Dry AMD and Wet AMD, Acute CSR and Chronic CSR as can be observed in the figure. This suggests that retinal lesions play a significant role in distinguishing between different disease stages. Although direct retinopathy grading results in the removal of the segmentation unit but its grading performance is far less as compared to the current approach. Also, the grading based on the segmentation unit only is not feasible because the presence of retinal lesions in any pathology is non-mutually exclusive e.g. the presence of retinal fluid and hard exudates can be observed for DME pathologies as well as for the exudative AMD pathologies. So, if the candidate scan is not diagnosed (classified) before, it cannot be properly graded.

VII. CONCLUSION
This paper presents RAG-FW, for the automated extraction of retinal lesions from multi-vendor OCT scans. Furthermore, the extracted lesions are utilized for the in-depth and intuitive grading of retinopathy. RAG-FW is a generic framework that is invariant to the scanner specifications and noisy artifacts. RAG-FW has been tested on 43,613 retinal OCT scans from five publicly available datasets, and it has been compared with many existing state-of-the-art solutions against various metrics. RAG-FW achieved the mean IoU score of 0.8055 for extracting retinal lesions and achieved the F 1 score of 99.52% for screening retinopathy. The proposed framework is currently limited in grading ME, CSR and AMD pathologies. However, it can be extended in the future to cater more pathologies, such as glaucoma, and their severities as well. Furthermore, incorporating findings from other retinal modalities such as FP can further strengthen the intuitive grading of retinopathy.