ElectrodeNet – A Deep Learning Based Sound Coding Strategy for Cochlear Implants
Objective: ElectrodeNet, a deep-learning based sound coding strategy for the cochlear implant (CI), is proposed in this study. The performance between ElectrodeNet and the advanced combination encoder (ACE) coding strategy in speech intelligibility is compared. Methods: ElectrodeNet emulates the ACE strategy and replaces the conventional envelope detection using various forms of artificial neural networks. Network models of deep neural network (DNN), convolutional neural network (CNN), and long short-term memory (LSTM) were trained using the fast Fourier transformed clean speech and the corresponding electrode stimulation patterns. Objective speech intelligibility was estimated for ElectrodeNets for the factors of loss function, network architecture, language, and noise type. Subjective listening tests for vocoded Mandarin speech were conducted with 40 normal-hearing listeners. Results: DNN, CNN, and LSTM based ElectrodeNets exhibited strong correlations with the ACE strategy in short-time objective intelligibility (STOI) and normalized covariance metric (NCM) scores. For objective evaluations, small mean squared error (MSE) scores between ACE and ElectrodeNets were less than 0.01 under all experimental conditions, whereas linear correlation coefficient (LCC) and Spearman’s rank correlation coefficient (SRCC) were obtained in large values greater than 0.97 and 0.96, respectively. According to the listening test results, substantial positive relationships were also observed between ACE and both DNN and CNN based ElectrodeNets with MSEs smaller than 0.02, and LCCs and SRCCs greater than 0.9. Significance: This study demonstrates the feasibility of using deep learning to encode sound into meaningful patterns for CI listening.