loading page

Utilization of Encoding, Early Stopping, Hyper Parameter Tuning, and Machine Learning Models for Bank Fraud Detection
  • +1
  • Md Aminul Islam ,
  • Saumik Chowdhury ,
  • Anindya Nag ,
  • Nahi Mumtaj
Md Aminul Islam
Oxford Brookes University

Corresponding Author:[email protected]

Author Profile
Saumik Chowdhury
Author Profile
Anindya Nag
Author Profile
Nahi Mumtaj
Author Profile

Abstract

Abstractâ\euro”An effective fraud detection system must protect millions of clients for a secure banking system, which can be achieved using machine learning and AI. In this article, we have applied four supervised machine learning models: k-nearest neighbors (KNN), random forest (RF), decision tree, and logistic regression (LR) algorithm to detect bank fraud for a synthetic dataset having 1,00,000 rows and 32 columns. Adequate preprocessing, decoding, rigorous feature engineering, validation, performance evaluation, and explanation have allowed the readers to understand the whole study. Though the algorithms’ accuracy is similar, logistic regression shows a higher accuracy of 0.98921 for label encoding, which is not prescribed. Still, a significant AUC of 95% has been achieved in XGBoost and LGBM. Further application of this study can be done in real-life cases of banks, insurance, and finance institutions.