Abstract
Our automated deep learning-based approach identifies
consolidation/collapse in LUS images to aid in the diagnosis of late
stages of COVID-19 induced pneumonia, where consolidation/collapse is
one of the possible associated pathologies. A common challenge in
training such models is that annotating each frame of an ultrasound
video requires high labelling effort. This effort in practice becomes
prohibitive for large ultrasound datasets. To understand the impact of
various degrees of labelling precision, we compare labelling strategies
to train fully supervised models (frame-based method, higher labelling
effort) and inaccurately supervised models (video-based methods, lower
labelling effort), both of which yield binary predictions for LUS videos
on a frame-by-frame level. We moreover introduce a novel sampled
quaternary method which randomly samples only 10% of the LUS video
frames and subsequently assigns (ordinal) categorical labels to all
frames in the video based on the fraction of positively annotated
samples. This method outperformed the inaccurately supervised
video-based method of our previous work on pleural effusions. More
surprisingly, this method outperformed the supervised frame-based
approach with respect to metrics such as precision-recall area under
curve (PR-AUC) and F1 score that are suitable for the class imbalance
scenario of our dataset despite being a form of inaccurate learning.
This may be due to the combination of a significantly smaller data set
size compared to our previous work and the higher complexity of
consolidation/collapse compared to pleural effusion, two factors which
contribute to label noise and overfitting; specifically, we argue that
our video-based method is more robust with respect to label noise and
mitigates overfitting in a manner similar to label smoothing. Using
clinical expert feedback, separate criteria were developed to exclude
data from the training and test sets respectively for our ten-fold cross
validation results, which resulted in a PR-AUC score of 73% and an
accuracy of 89%. While the efficacy of our classifier using the sampled
quaternary method must be verified on a larger consolidation/collapse
dataset, when considering the complexity of the pathology, our proposed
classifier using the sampled quaternary video-based method is clinically
comparable with trained experts and improves over the video-based method
of our previous work on pleural effusions.