Detecting Glaucoma in the Ocular Hypertension Treatment Study Using Deep Learning: Implications for clinical trial endpoints
To investigate the diagnostic accuracy of deep learning (DL) algorithms to detect primary open-angle glaucoma (POAG) trained on fundus photographs from the Ocular Hypertension Treatment Study (OHTS). 66,715 photographs from 3,272 eyes were used to train and test a ResNet-50 model to detect the OHTS Endpoint Committee POAG determination based on optic disc (n=287 eyes, 3,502 photographs) and/or visual field (n=198 eyes, 2,300 visual fields) changes. OHTS training, validation and testing sets were randomly determined using an 85-5-10 percentage split by subject. Three independent test sets were used to estimate the generalizability of the model: UCSD Diagnostic Innovations in Glaucoma Study (DIGS, USA), ACRIMA (Spain) and Large-scale Attention-based Glaucoma (LAG, China). The DL model achieved an AUROC (95% CI) of 0.88 (0.82, 0.92) for the overall OHTS POAG endpoint. For the OHTS endpoints based on optic disc changes or visual field changes, AUROCs were 0.91 (0.88, 0.94) and 0.86 (0.76, 0.93), respectively. False-positive rates (at 90% specificity) were higher in photographs of eyes that later developed POAG by disc or visual field (19.1%) compared to eyes that did not develop POAG (7.3%) during their OHTS follow-up. The diagnostic accuracy of the DL model developed on the OHTS optic disc endpoint applied to 3 independent datasets was lower with AUROC ranging from 0.74 to 0.79. High diagnostic accuracy of the current model suggests that DL can be used to automate the determination of POAG for clinical trials and management. In addition, the higher false-positive rate in early photographs of eyes that later developed POAG suggests that DL models detected POAG in some eyes earlier than the OHTS Endpoint Committee.