Unsupervised-based Distributed Machine Learning for Efficient Data Clustering and Prediction

Vishnu Baligodugula; Fathi Amsaad; Vincent Schmidt; Noor Zaman Jhanjhi

doi:10.36227/techrxiv.170630760.07077903/v1

loading page

Unsupervised-based Distributed Machine Learning for Efficient Data Clustering and Prediction

Vishnu Baligodugula,
Fathi Amsaad,
Vincent Schmidt,
Noor Zaman Jhanjhi

Abstract

Unsupervised ML-based approaches have emerged for driving critical decisions about training data samples to help solve challenges in many life critical applications. This paper proposes parallel and distributed computing unsupervised ML techniques to improve the execution time of different ML algorithms. Various unsupervised ML models are developed, implemented, and tested to demonstrate the efficiency, in terms of execution time and accuracy, of the serial methods as compared to the parallelized ones. We developed sequential, parallel, and distributed cloud computing unsupervised ML models based and determined the most efficient model through comparative analysis. As a case study, sequential, parallel, and distributed approaches of Simple K-Means, Minibatch K-means, and Fuzzy C-Means are investigated to study the developed models' efficiency using country datasets for multiple organizations to train and test the developed model. Parallel and distributed computing models are developed utilizing could computing architect, i.e., cloud Amazon SageMaker, to study their efficiency in the execution time and model accuracy. The results show that the proposed parallel and distributed Fuzzy C-Means outperforms the other two clustering methods in terms of execution time with 0.932ms and 0.623ms with a minimal impact on the accuracy of the developed models.

24 Jan 2024Submitted to TechRxiv

26 Jan 2024Published in TechRxiv

Abstract

Peer review timeline