Predicting COVID-19 Spread Level using Socio-Economic Indicators and
Machine Learning Techniques
Abstract
The new so-called COVID-19 virus is unfortunately founded to be highly
transmissible across the globe. In this study, we propose a novel
approach for estimating the spread level of the virus for each country
for three different dates between April and May 2020. Unlike previous
studies, this investigation does not process any historical data of
spread but rather relies on the socio-economic indicators of each
country. Actually, more than 1000 socio-economic indicators and more
than 190 countries were processed in this study. Concretely, data
preprocessing techniques and feature selection approaches were applied
to extract relevant indicators for the classification process. Countries
around the globe were assigned to 4 classes of spread. To find the class
level of each country, many classifiers were proposed based especially
on Support Vectors Machines (SVM), Multi-Layer Perceptrons (MLP) and
Random Forests (RF). Obtained results show the relevance of our approach
since many classifiers succeeded in capturing the spread level,
especially the RF classifier, with an F-measure equal to 93.85% for
April 15th, 2020. Moreover, a feature importance study is conducted to
deduce the best indicators to build robust spread level classifiers.
However, as pointed out in the discussion, classifiers may face some
difficulties for future dates since the huge increase of cases and the
lack of other relevant factors affecting this widespread.