loading page

Unsupervised-based Distributed Machine Learning for Efficient Data Clustering and Prediction
  • +1
  • Vishnu Baligodugula,
  • Fathi Amsaad,
  • Vincent Schmidt,
  • Noor Zaman Jhanjhi
Vishnu Baligodugula
Fathi Amsaad
Vincent Schmidt
Noor Zaman Jhanjhi

Corresponding Author:[email protected]

Author Profile


Unsupervised ML-based approaches have emerged for driving critical decisions about training data samples to help solve challenges in many life critical applications. This paper proposes parallel and distributed computing unsupervised ML techniques to improve the execution time of different ML algorithms. Various unsupervised ML models are developed, implemented, and tested to demonstrate the efficiency, in terms of execution time and accuracy, of the serial methods as compared to the parallelized ones. We developed sequential, parallel, and distributed cloud computing unsupervised ML models based and determined the most efficient model through comparative analysis. As a case study, sequential, parallel, and distributed approaches of Simple K-Means, Minibatch K-means, and Fuzzy C-Means are investigated to study the developed models' efficiency using country datasets for multiple organizations to train and test the developed model. Parallel and distributed computing models are developed utilizing could computing architect, i.e., cloud Amazon SageMaker, to study their efficiency in the execution time and model accuracy. The results show that the proposed parallel and distributed Fuzzy C-Means outperforms the other two clustering methods in terms of execution time with 0.932ms and 0.623ms with a minimal impact on the accuracy of the developed models.
24 Jan 2024Submitted to TechRxiv
26 Jan 2024Published in TechRxiv