Abstract
Wearable EEG applications demand an optimal trade-off between
performance and system power consumption. However, high-performing
models usually require many features for training and inference, leading
to a high computational and memory budget. In this paper, we present a
novel knowledge distillation methodology to reduce the number of EEG
channels (and therefore, the associated features) without compromising
on performance. We aim to distill information from a model trained using
all channels (teacher) to a model using a reduced set of channels
(student). To this end, we first pre-train the state-of-the-art model on
features extracted from all channels. Then, we train a naive model on
features extracted from a few task-specific channels using the soft
labels predicted by the teacher model. As a result, the student model
with a reduced set of features learns to mimic the teacher via soft
labels. We evaluate this methodology on two publicly available datasets:
CHB-MIT for epileptic seizure detection and BCI competition IV-2a
dataset for motor-imagery classification. Results show that the proposed
channel reduction methodology improves the precision of the seizure
detection task by about 8% and the motor-imagery classification
accuracy by about 3.6%. Given these consistent results, we conclude
that the proposed framework facilitates future lightweight wearable EEG
systems without any degradation in performance.