loading page

Deep Learning as an Improved Method of Preprocessing Biomedical Raman Spectroscopy Data
  • +3
  • Mohammadrahim Kazemzadeh ,
  • Colin Hisey ,
  • Miguel Martinez Calderon ,
  • Larry Chamley ,
  • Peter Xu ,
  • Neil Broderick
Mohammadrahim Kazemzadeh
Author Profile
Colin Hisey
University of Auckland

Corresponding Author:[email protected]

Author Profile
Miguel Martinez Calderon
Author Profile
Larry Chamley
Author Profile
Neil Broderick
Author Profile


Machine learning has had a significant impact on the value of spectroscopy-based characterization tools, particularly in biomedical applications, due to its ability to detect latent patterns within complex spectral data. However, it often requires extensive data preprocessing, including baseline correction and denoising, which can lead to unintentional bias during classification. To address this, we present a deep learning-based signal preprocessing method capable of handling all the defects of raw Raman spectroscopy data without any need of human intervention. To achieve this, a novel deep convolutional neural network (CNN) architecture was trained on randomly generated spectra with various defects. We demonstrate that the proposed network results in faster training and that it can perform complete spectral preprocessing in a single step with more accuracy, speed, and defect tolerance than conventional methods. These improvements make it an ideal candidate for hyperspectral imaging applications in which tens of thousands of raw spectra may need to be processed rapidly. The superiority of this method is demonstrated for simulated Raman spectra, surface-enhanced Raman spectroscopy (SERS) imaging of chemical species, classification of low resolution Raman spectra of human bladder cancer tissue, and finally classification of SERS spectra from human placental extracellular vesicles (EVs). These findings encourage the future use of deep learning as a rapid and unbiased method of preprocessing spectroscopy data and may be particularly useful in biomedical applications involving large data sets from highly heterogeneous samples and signal defects of complex nature.