Automatic wheeze segmentation using harmonic-percussive source
separation and empirical mode decomposition
Abstract
Wheezes are adventitious respiratory sounds commonly present in patients
with respiratory conditions. The presence of wheezes and their time
location are relevant for clinical reasons, such as understanding the
degree of bronchial obstruction. Conventional auscultation is usually
employed to analyze wheezes, but remote monitoring has become a pressing
need during recent years. Automatic respiratory sound analysis is
required to reliably perform remote auscultation. In this work we
propose a method for wheeze segmentation. Our method starts by
decomposing a given audio excerpt into intrinsic mode frequencies using
empirical mode decomposition. Then, we apply harmonic-percussive source
separation to the resulting audio tracks and get harmonic-enhanced
spectrograms, which are processed to obtain harmonic masks.
Subsequently, a series of empirically derived rules are applied to find
wheeze candidates. Finally, the candidates stemming from the different
audio tracks are merged and median filtered. In the evaluation stage, we
compare our method to three baselines on the ICBHI 2017 Respiratory
Sound Database, a challenging dataset containing various noise sources
and background sounds. Using the full dataset, our method outperforms
the baselines, achieving an F1 of 41.9%. Our method’s performance is
also better than the baselines across several stratified results
focusing on five variables: recording equipment, age, sex, body-mass
index, and diagnosis. We conclude that, contrary to what has been
reported in the literature, wheeze segmentation has not been solved for
real life scenario applications. Adaptation of existing systems to
demographic characteristics might be a promising step in the direction
of algorithm personalization, which would make automatic wheeze
segmentation methods clinically viable.