An explainable XGBoost–based approach towards assessing the risk of cardiovascular disease in patients with Type 2 Diabetes Mellitus
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
Cardiovascular Disease (CVD) is an important cause of disability and death among individuals with Diabetes Mellitus (DM). International clinical guidelines for the management of Type 2 DM (T2DM) are founded on primary and secondary prevention and favor the evaluation of CVD related risk factors towards appropriate treatment initiation. CVD risk prediction models can provide valuable tools for optimizing the frequency of medical visits and performing timely preventive and therapeutic interventions against CVD events. The integration of explainability modalities in these models can enhance human understanding on the reasoning process, maximize transparency and embellish trust towards the models’ adoption in clinical practice. The aim of the present study is to develop and evaluate an explainable personalized risk prediction model for the fatal or non-fatal CVD incidence in T2DM individuals. An explainable approach based on the eXtreme Gradient Boosting (XGBoost) and the Tree SHAP (SHapley Additive exPlanations) method is deployed for the calculation of the 5-year CVD risk and the generation of individual explanations on the model’s decisions. Data from the 5- year follow up of 560 patients with T2DM are used for development and evaluation purposes. The obtained results (AUC=71.13%) indicate the potential of the proposed approach to handle the unbalanced nature of the used dataset, while providing clinically meaningful insights about the ensemble model’s decision process.