loading page

Healthcare Cost Patterns and Prediction: Investigating Personal Datasets using Data Analytics
  • +3
Md Aminul Islam
Oxford Brookes University

Corresponding Author:[email protected]

Author Profile
Anindya Nag
Author Profile
Pretam Chandra
Author Profile
SM Firoz Ahmed Fahim
Author Profile
Md Mozammel Hoque
Author Profile


The present study introduces a health insurance prediction system that leverages machine learning methodologies. In contemporary times, there has been a notable increase in endeavors focused on tackling this matter since the significance of health insurance as a research topic has markedly escalated following the pandemic. The dataset employed in this research comprises 1338 observations 7 columns and corresponds to individual medical expenditures in the United States, available at the Kaggle platform. The dataset encompasses a variety of variables utilized in the prediction of insurance prices, including age, gender, BMI, smoking status, and number of children. The researchers used machine learning models, including neural networks, XAI, and auto modeling, to determine the correlation between pricing and the attributes. The training process involved partitioning the dataset into an 80-20 ratio for training and evaluation. Consequently, the system achieved an accuracy rate of 97% by Gradient Boosting, but we corrected it to 92% by Gradient Boosting Regressor by encoding and hyper-tuning. Also, among predictive machine learning models, Random Forest had the best accuracy i.e., of 83.44%.Â