Deep Learning of COVID-19 Chest X-Rays: New Models or Fine Tuning?

Chest X-rays have been found to be very promising for assessing COVID-19 patients, especially for resolv-ing emergency-department and urgent-care-center overcapacity. Deep-learning (DL) methods in artiﬁcial intelligence (AI) play a dominant role as high-performance classiﬁers in the detection of the disease using chest X-rays. While many new DL models have been being developed for this purpose, this study aimed to investigate the ﬁne tuning of pretrained convolutional neural networks (CNNs) for the classiﬁcation of COVID-19 using chest X-rays. Three pretrained CNNs, which are AlexNet, GoogleNet, and SqueezeNet, were selected and ﬁne-tuned without data augmentation to carry out 2-class and 3-class classiﬁcation tasks using 3 public chest X-ray databases. In comparison with other recently developed DL models, the 3 pretrained CNNs achieved very high classiﬁcation results in terms of accuracy, sensitivity, speciﬁcity, precision, F 1 score, and area under the receiver-operating-characteristic curve. AlexNet, GoogleNet, and SqueezeNet require the least training time among pretrained DL models, but with suitable selection of training parameters, excellent classiﬁcation results can be achieved without data augmentation by these networks. The ﬁndings contribute to the urgent need for harnessing the pandemic by facilitating the deployment of AI tools that are fully automated and readily available in the public domain for rapid implementation.


Introduction
COVID-19 (coronavirus disease 2019) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is a strain of coronavirus. The disease was officially announced as a pandemic by the World Health Organisation (WHO) on 11 March 2020. Given spikes in new COVID-19 cases and the re-opening of daily activities around the world, the demand for curbing the pandemic is to be more emphasized.
Medical images and artificial intelligence (AI) have been found useful for rapid assessment to provide treatment of COVID-19 infected patients. Therefore, the design and deployment of AI tools for image classification of COVID-19 in a short period of time with limited data have been an urgent need for fighting the current pandemic. Radiologists have recently found that deep learning (DL) developed in AI, which was able to detect tuberculosis in chest X-rays, could be useful for identifying lung abnormalities related to COVID-19 and help clinicians in deciding the order of treatment of high-risk COVID-19 patients [1]. The role of medical imaging has also been confirmed by others as important information to enable the fast diagnosis of COVID-19 [2], and the coupling of AI and chest imaging can help explain the complications of COVID-19 [3].
Regarding the image analysis of COVID-19, chest X-ray is an imaging method to diagnose COVID-19 infection adopted by hospitals, particularly the first image-based approach used in Spain [4]. The protocol is that if a clinical suspicion about the infection remains after the examination of a patient, a sample of nasopharyngeal exudate is obtained to test the reverse-transcription polymerase chain reaction (RT-PCR) and the taking of a chest X-ray film follows. Because the results of the PCR test may take several hours to become available, information revealed from the chest X-ray plays an important role for a rapid clinical assessment as follows. If the clinical condition and the chest X-ray are normal, the patient is sent home while awaiting the results of the etiological test. But if the X-ray shows pathological findings, the suspected patient will be admitted to the hospital for close monitoring. In general, the absence or presence of pathological 1 findings on the chest X-ray is the basis for making a clinical decision in sending the patient home or keeping the patient in the hospital for further observation.
While radiography in medical examinations can be quickly performed and become widely available with the prevalence of chest radiology imaging systems in healthcare systems, the interpretation of radiography images by radiologists is limited due to the human capacity in detecting the subtle visual features present in the images. Because AI can discover patterns in chest X-rays that normally would not be recognized by radiologists [5,6,7,8], there have been many reports in literature about new developments of DL models using convolutional neural networks (CNNs) for differentiating COVID-19 from non-COVID-19 using public databases of chest X-rays (related works are presented in the next section).
This study attempted to investigate the potential of the parameter adjustments in the transfer learning of three popular pretrained CNNs: AlexNet, GoogLeNet, and SqueezeNet, which are known to have least prediction and training iteration times among other pretrained CNNs reported from the ImageNet Large-Scale Visual Recognition Challenge [9]. If these fine-tuned networks could achieve desired performance in the classification of COVID-19 chest X-ray images by a configuration in such a way to highly perform the task, then the contribution of the findings to the coronavirus pandemic relief would be significant. This is because it can facilitate the urgent need for rapidly deploying AI tools to assist clinicians in making optimal clinical decisions by saving time, resources, and technical efforts in developing models that may result in the same or lower performance.

Related Works
Peer-reviewed works that are related to the study presented herein are described as follows.
The Bayes-SqueezeNet [10] was introduced for detecting the COVID-19 using chest X-rays. The proposed net consists of the offline augmentation of the raw dataset and model training using the Bayesian optimization. The Bayes-SqueezeNet was applied for classifying X-ray images labeled in 3 classes as normal, viral pneumonia, and COVID-19. Using the data augmentation, the net claimed to overcome the problem of imbalanced data obtained from the public databases.
As another CNN, the CoroNet [11] was developed for detecting COVID-19 infection from chest X-ray images. This model is based on the pretrained CNN known as the Xception [12]. CoroNet adopts the Xception as base model with a dropout layer and two fully-connected layers added at the end. As a result, CoroNet has 33,969,964 parameters in total out of which 33,969,964 trainable and 54,528 are non-trainable parameters. The net was applied for 3-class classification (COVID-19, pneumonia, and normal) as well as 4-class classification (COVID-19, pneumonia bacterial, pneumonia viral, and normal).
The CovidGAN [13] was proposed as an auxiliary classifier generative adversarial network based on GAN (generative adversarial network) [15] for the detection of COVID-19. The architecture of the CovidGan is built on the pretrained VGG-16 [14], which is connected with four custom layers at the end with a global average pooling layer followed by a 64 units dense layer and a dropout layer with 0.5 probability. The net further utilizes the GAN approach for generating synthetic chest X-ray images to improve the classification performance.
The DarkCovidNet [16], which was built on the DarkNet model [17], was another CNN model proposed for COVID-19 detection using chest X-rays. The DarkCovidNet consists of fewer layers and (gradually increased) filters than the original DarkNet. This model was tested for a 2-class classification (COVID-19 and no-findings) and 3-class classification (COVID-19 no-findings, and pneumonia).

Pretrained CNNs and Training Parameters for Transfer Learning
The architectures and specification of training parameters for transfer learning of AlexNet [21], GoogLeNet [22], and SqueezeNet [23] are described as follows.
First, the layer graph from the pretrained network was extracted. If the network was a SeriesNetwork object, such as AlexNet, then the list of layers was converted to a layer graph. In the pretrained networks, the last layer with learnable weights is a fully connected layer. This fully connected layer was replaced with a new fully connected layer with the number of outputs equal to the number of classes in the new data set, which is 2 or 3, in this study. In the pretrained SqueezeNet, the last learnable layer is a 1-by-1 convolutional layer instead. In this case, the convolutional layer was replaced with a new convolutional layer with the number of filters equal to the number of classes.
The original chest X-ray images were converted into RGB images and resized to fit into the input image size of each pretrained CNN. For the training options, the stochastic gradient descent with momentum optimizer was used, where the momentum value = 0.9000; gradient threshold method = L 2 norm; minimum batch size = 10; maximum number of epochs = 10; initial learning rate= 0.0003; the learning rate remained constant throughout training; the training data were shuffled before each training epoch, and the validation data were shuffled before each network validation; and factor for L 2 regularization (weight decay) = 0.0001.
The COVID-19 Radiography Database consists of chest X-rays of 219 COVID-19 positive images, 1341 normal images, and 1345 viral pneumonia images. The COVID-19 Chest X-Ray Dataset Initiative has 55 COVID-19 positive images. IEEE8023/Covid Chest X-Ray Dataset is part of the COVID-19 Image Data Collection of chest X-ray and CT images of patients which are positive or suspected of COVID-19 or other viral and bacterial pneumonias, in which 706 images are chest X-rays. These number of images of these databases, which are expected to increase over time with more available data, were reported on the date of access. Figure 1 shows some chest X-ray images of COVID-19, viral pneumonia, and normal subjects provided by the COVID-19 Radiography Database. Figure 2 and Figure 3 show some chest X-ray images of COVID-19 obtained from the COVID-19 Chest X-Ray Dataset Initiative and IEEE8023/Covid Chest X-Ray Dataset, respectively.

Design of Chest X-ray Subsets
Six subsets of chest X-ray data were constructed out of the COVID-19 Radiography Database (Kaggle), COVID-19 Chest X-Ray Dataset Initiative, and IEEE8023/Covid Chest X-Ray Dataset to test and compare the performance of the pretrained CNNs. These 6 subsets are described as follows.

Dataset 1
This dataset includes 403 chest X-rays of COVID-19 and 721 chest X-rays of healthy subjects . All images of the healthy subjects were taken from the COVID-19 Radiography Database. This dataset was designed for a two-class classification to compare with the study reported in [13].

Dataset 2
This chest X-ray dataset has 438 images of COVID-19 and 438 images of healthy subjects. All images of the healthy subjects were taken from the COVID-19 Radiography Database. This balanced dataset was designed for a two-class classification with more COVID-19 images.

Dataset 3
This chest X-ray dataset has 438 images of COVID-19 and 876 images of healthy and viral pneumonia subjects (438 healthy and 438 viral pneumonia) cases. All images of the healthy and viral pneumonia subjects were taken from the COVID-19 Radiography Database. This dataset was designed for a two-class classification.

Dataset 4
To carry out a three-class classification, this chest X-ray dataset has 438 images of COVID-19, 438 images of viral pneumonia, and 438 images of healthy subjects. All images of the healthy and viral pneumonia subjects were taken from the COVID-19 Radiography Database.

Dataset 5
This two-class dataset consists of all images of the COVID-19 (class 1), and healthy and viral pneumonia subjects (class 2) of the COVID-19 Radiography Database.

Dataset 6
This three-class dataset consists of all images of the COVID-19 (class 1), viral pneumonia (class 2), and healthy subjects (class 3) of the COVID-19 Radiography Database.

Performance Metrics
Six metrics used for evaluating the performance of the CNNs are accuracy, sensitivity, specificity, precision, F 1 score, and area under a receiver operating characteristic curve (AUC).
The sensitivity (SEN ) is defined as the percentage of COVID-19 patients who are correctly identified as having the infection, and expressed as where T P is called true positive, denoting the number of COVID-19 patients who are correctly identified as having the infection, F N false negative, denoting the number of COVID-19 patients who are misclassified as having no infection of COVID-19, and P the total number of COVID-19 patients. The specificity (SP E) is defined as the percentage of non-COVID-19 subjects who are correctly classified as having no infection of COVID-19: where T N is called true negative and denotes the number of non-COVID-19 subjects who are correctly identified as having no infection of COVID-19, F P false positive, denoting the number of non-COVID-19 subjects who are misclassified as having the infection, and N the total number of non-COVID-19 subjects. The percent accuracy (ACC) of the classification is defined as The precision (P RE) is also known as the percentage of positive predictive value and fefined as: The F 1 score is defined as the harmonic mean of precision and sensitivity: The receiver operating characteristic (ROC) is a probability curve created by plotting the TP rate against the FP rate at various threshold settings, and the AUC represents the measure of performance of a classifier. The higher the AUC is, the better the model at distinguishing between COVID-19 and non-COVID-19 cases. For a perfect classifier, AUC = 1, and an AUC = 0.5 indicates a classifier that randomly assigns observations to classes. The AUC was calculated using the trapezoidal integration to estimate the area under the ROC curve.

Results
All results are reported as the average values and standard deviations of 3 executions of randomly selected ratios of training and testing data. Table 1 shows the classification results obtained from the transfer learning of AlexNet, GoogLeNet, and SqueezeNet, using Dataset 1 with two different training and testing data ratios. The 3 pretrained CNNs achieved very high accuracy, sensitivity, specificity, precision, F 1 score, and AUC in all cases. Particularly, GoogLeNet and SqueezeNet had almost 100% accuracy with 80% training and 20% testing data. The AUCs are almost perfect in all cases for all three CNNs. Figure 4 shows the training processes of the transfer learning of the three CNNs, and Figure 5 shows the features at the fully connected layers extracted from transfer learning of the three CNNs, all using Dataset 1 with 80% training and 20% testing. Tables 2 and 3 show the classification results obtained from the AlexNet, GoogLeNet, and SqueezeNet for a 2-class classification of COVID-19 and normal cases (Dataset 2), and COVID-19 and both normal and viral pneumonia (Dataset 3) with 50% of the data for training and the other 50% for testing, respectively. For Dataset 2, all classifiers achieved accuracy, sensitivity, specificity, and precision > 99%, and F 1 score > 0.990, and AUC almost 1. For Dataset 3, all achieved > 98% in accuracy, > 97% in sensitivity, > 98% in specificity and precision, > 0.975 for F 1 score, and almost 1 for AUC. Table 4 shows the 3-class classification (COVID-19, viral pneumonia, and normal) results obtained from the transfer learning of three pretrained CNNs using Dataset 4 with 50% of the data for training and the other 50% for testing. All the three CNNs achieved accuracies > 96%, sensitivity > 97%, specificity > 95%, precision > 96%, F 1 score ≥ 0.970, and AUC = 0.998. Table 5 shows the results obtained from the three CNNs using Dataset 5, of which accuracies (> 99%), AUCs (= 0.999), and specificity (> 99%) are similar for both cases of 1) 90% training and 10% testing data, and 2) 50% training and 50% testing data. The AlexNet achieved the best average sensitivity (98.48%) using 90% training and 10% testing data, and the SqueezeNet achieved the best average sensitivity (98.47%) for 50% training and 50% testing data.
For the 3-class classification using Dataset 6, the results as shown in Table 6 are still very high but slightly lower than those obtained using Dataset 5 for the 2-class classification. For both cases of 1) 90% training and 10% testing data, and 2) 50% training and 50% testing data, all the accuracies are ≥ 96%, specificity > 96%, and AUC > 0.998. The SqueezeNet has the highest sensitivity (98.48%) for 90% training and 10% testing data, while the GoogLeNet achieved the highest sensitivity (95.23%) for 50% training and 50% testing data. The precision (98.48%) is highest for the GoogLeNet using 90% training and 10% testing data, and highest (96.75%) for the SqueezeNet using 50% training and 50% testing data. The GoogLeNet achieved the highest F 1 scores as 0.977 and 0.952 for both 90% training and 10% testing, and 50% training and 50% testing, respectively.

Comparions with Related Works
The CovidGAN [13] aimed to generate synthetic chest X-ray images using the principle of GAN for the classification. Using the combination of three databases (COVID-19 Radiography Database, COVID-19 Chest X-Ray Dataset Initiative, and IEEE8023/Covid Chest X-Ray Dataset) with about 80% training and 20% testing data, this network achieved 95% in accuracy, 90% in sensitivity, and 97% in specificity. Using the same database combination with 80% training and 20% testing data without data augmentation, all three fine-tuned CNNs reported in the present study (Table 1) achieved accuracy > 99%, sensitivity from 98% (AlexNet) to 100% (GoogLeNet and SqueezeNet), and specificity > 99%.

Discussions
The results obtained from the transfer learning of the fine-tuned AlexNet, GoogLeNet, and SqueezeNet illustrate the high accomplishment of the pretrained models for the classification of COVID-19. Due to the database updates over time and the public availability of other data collections, it is impossible to carry out exact comparisons of the results reported in the present and other works. However, the range of comparisons with the related works strongly suggest that the fine-tuned pretrained networks achieved equivalent or better performance than several other works in terms of accuracy and simplicity.
Both AlexNet and SqueezeNet take the least training and prediction time among many other pretrained CNNs. In this study, data augmentation was not applied to the transfer learning of the three networks. However, very high classification results could be obtained by using suitable parameters for the transfer learning of new data. This finding emphasizes the role of fine tuning of pretrained CNNs for handling new data before adding more complex architectures to the networks. The finding in this study can be useful for the rapid deployment of available AI models for the fast, reliable, and cost-effective detection of COVID-19 infection.

Conclusion
The transfer learning of three popular pretrained CNNs for the classification of chest X-ray images of COVID19, viral pneumonia, and normal conditions using several subsets of three publicly available databases have been presented and discussed. The performance metrics obtained from different settings of training and testing data have demonstrated the effectiveness of these three networks. The present results suggest the fine tuning of the network learning parameters is important as it can help avoid making efforts in developing more complex models when existing ones can result in the same or better performance. Hospitals and institutions across continents have been trying to rapidly develop AI-based solutions for solving the time-sensitive COVID-19 crisis. The findings reported in this study can facilitate the free availability of AI models to all participants for clinical validations.