Improving Face Alignment Accuracy on Clinical Populations and its effect
on the Video-based Detection of Neurological Diseases
Abstract
Background: Automatic facial landmark localization is an essential
component in many computer vision applications, including video-based
detection of neurological diseases. Machine learning models for facial
landmarks localization are typically trained on faces of healthy
individuals, and we found that model performance is inferior when
applied to faces of people with neurological diseases. Fine-tuning
pre-trained models with representative images improves performance on
clinical populations significantly. However, questions related to the
characteristics of the database used to fine-tune the model and the
clinical impact of the improved model remain. Methods: We employed the
Toronto NeuroFace dataset – a dataset consisting videos of Healthy
Controls (HC), individuals Post-Stroke, and individuals with Amyotrophic
Lateral Sclerosis performing speech and non-speech tasks with thousands
of manually annotated frames - to fine-tune a well-known deep
learning-based facial landmark localization model. The pre-trained and
fine-tuned models were used to extract landmark-based facial features
from videos, and the facial features were used to discriminate clinical
groups from HC. Results: Fine-tuning a facial landmark localization
model with a diverse database that includes HC and individuals with
neurological disorders resulted in significantly improved performance
for all groups. Our results also showed that fine-tuning the model with
representative data greatly improved the ability of the subsequent
classifier to classify clinical groups vs. HC from videos. Conclusions:
Using a diverse database for model fine-tuning might result in better
model performance for HC and clinical groups. We demonstrated that
fine-tuning a model for landmark localization with representative data
results in improved detection of neurological diseases.