Predicting COVID-19 Spread Level using Socio-Economic Indicators and Machine Learning Techniques
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
The new so-called COVID-19 virus is unfortunately founded to be highly transmissible across the globe. In this study, we propose a novel approach for estimating the spread level of the virus for each country for three different dates between April and May 2020. Unlike previous studies, this investigation does not process any historical data of spread but rather relies on the socio-economic indicators of each country. Actually, more than 1000 socio-economic indicators and more than 190 countries were processed in this study. Concretely, data preprocessing techniques and feature selection approaches were applied to extract relevant indicators for the classification process. Countries around the globe were assigned to 4 classes of spread. To find the class level of each country, many classifiers were proposed based especially on Support Vectors Machines (SVM), Multi-Layer Perceptrons (MLP) and Random Forests (RF). Obtained results show the relevance of our approach since many classifiers succeeded in capturing the spread level, especially the RF classifier, with an F-measure equal to 93.85% for April 15th, 2020. Moreover, a feature importance study is conducted to deduce the best indicators to build robust spread level classifiers. However, as pointed out in the discussion, classifiers may face some difficulties for future dates since the huge increase of cases and the lack of other relevant factors affecting this widespread.