Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering

Linqin Cai; Haodu Fang; Nuoying Xu; Bo Ren

doi:10.36227/techrxiv.171440904.49751293/v1

loading page

Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering

Linqin Cai,
Haodu Fang,
Nuoying Xu,
Bo Ren

Abstract

Medical Visual Question Answering (VQA-Med) is a challenging task that involves answering clinical questions related to medical images. However, most current VQA-Med methods ignore the causal correlation between specific lesion or abnormality features and answers, while also failing to provide accurate explanations for their decisions. Moreover, VQA-Med methods suffer from the common language bias problem in generic VQA. To explore the interpretability and language bias of VQA-Med, this paper proposes a novel CCIS-MVQA model for VQA-Med based on a counterfactual causal-effect intervention strategy. This model consists of the modified ResNet for image feature extraction, a GloVe decoder for question feature extraction, a bilinear attention network for vision and language feature fusion, and an interpretability generator for producing the interpretability and prediction results. The proposed CCIS-MVQA introduces a layer-wise relevance propagation method to automatically generate counterfactual samples for improving interpretability and alleviating language bias. Additionally, CCIS-MVQA applies counterfactual causal reasoning throughout the training phase to enhance interpretability and generalization. Extensive experiments on three benchmark datasets show that the proposed CCIS-MVQA model outperforms the state-of-the-art methods. Enough visualization results are produced to analyze the interpretability and debasing performance of CCIS-MVQA.

22 Apr 2024Submitted to TechRxiv

29 Apr 2024Published in TechRxiv

Abstract

Peer review timeline