Detecting Glaucoma in the Ocular Hypertension Treatment Study Using Deep
Learning: Implications for clinical trial endpoints
Abstract
To investigate the diagnostic accuracy of deep learning (DL) algorithms
to detect primary open-angle glaucoma (POAG) trained on fundus
photographs from the Ocular Hypertension Treatment Study (OHTS). 66,715
photographs from 3,272 eyes were used to train and test a ResNet-50
model to detect the OHTS Endpoint Committee POAG determination based on
optic disc (n=287 eyes, 3,502 photographs) and/or visual field (n=198
eyes, 2,300 visual fields) changes. OHTS training, validation and
testing sets were randomly determined using an 85-5-10 percentage split
by subject. Three independent test sets were used to estimate the
generalizability of the model: UCSD Diagnostic Innovations in Glaucoma
Study (DIGS, USA), ACRIMA (Spain) and Large-scale Attention-based
Glaucoma (LAG, China). The DL model achieved an AUROC (95% CI) of 0.88
(0.82, 0.92) for the overall OHTS POAG endpoint. For the OHTS endpoints
based on optic disc changes or visual field changes, AUROCs were 0.91
(0.88, 0.94) and 0.86 (0.76, 0.93), respectively. False-positive rates
(at 90% specificity) were higher in photographs of eyes that later
developed POAG by disc or visual field (19.1%) compared to eyes that
did not develop POAG (7.3%) during their OHTS follow-up. The diagnostic
accuracy of the DL model developed on the OHTS optic disc endpoint
applied to 3 independent datasets was lower with AUROC ranging from 0.74
to 0.79. High diagnostic accuracy of the current model suggests that DL
can be used to automate the determination of POAG for clinical trials
and management. In addition, the higher false-positive rate in early
photographs of eyes that later developed POAG suggests that DL models
detected POAG in some eyes earlier than the OHTS Endpoint Committee.