loading page

MCSwin: A Self-Supervised Learning Framework Combining Mask Image Modeling with Contrastive Learning for Cervical OCT Image Classification
  • +3
  • Qingbin Wang,
  • Yuxuan Xiong,
  • Hanfeng Zhu,
  • Xuefeng Mu,
  • Yan Zhang,
  • Yutao Ma
Qingbin Wang
Yuxuan Xiong
Hanfeng Zhu
Xuefeng Mu
Yan Zhang

Corresponding Author:

Yutao Ma

Corresponding Author:[email protected]

Author Profile


Cervical cancer poses a major health threat to women globally. Optical coherence tomography (OCT) imaging has recently shown promise for non-invasive cervical lesion diagnosis. However, a shortage of high-quality labeled cervical OCT images hampers the practical training of deep learning models. Inspired by the idea of self-supervised pre-training, we propose MCSwin, a novel framework combining masked image modeling (MIM) with contrastive learning based on the Swin-Transformer architecture for high-resolution cervical OCT images. In this contrastive-MIM framework, mixed image encoding is combined with a latent contextual regressor to solve the inconsistency problem between pre-training and fine-tuning and separate the encoder's feature extraction task from the decoder's reconstruction task, allowing the encoder to extract better image representations. Besides, contrastive losses at the patch and image levels are elaborately designed to leverage massive unlabelled data. We validated the superiority of MCSwin over the state-of-the-art self-supervised learning approaches with five-fold cross-validation on an OCT image dataset containing 1,452 patients from a multicenter clinical study in China, plus two external validation sets from top-ranked Chinese hospitals: West China (Huaxi) and Xiangya. A human-machine comparison experiment on the Huaxi and Xiangya datasets for volume-level binary classification also indicates that MCSwin can match or exceed the average level of four skilled medical experts, especially in identifying high-risk cervical lesions. Additionally, the integrated GradCAM module of MCSwin enables cervical lesion visualization and interpretation, providing good interpretability for gynecologists to diagnose cervical diseases efficiently. Therefore, our work has great potential to assist gynecologists in intelligently interpreting cervical OCT images in clinical settings.
26 Jan 2024Submitted to TechRxiv
29 Jan 2024Published in TechRxiv