Fluorescence Lifetime Endomicroscopic Image-based ex-vivo Human Lung Cancer Diﬀerentiation Using Machine Learning

,


Introduction
Machine learning (ML) has achieved remarkable success in detecting cancer of different organs with the images at macro and/or micro level from various medical instruments, e.g.computed tomography and microscopy [1].Particularly, fluorescence lifetime techniques have been successfully applied for various medical applications to diagnose disease and cancer [2][3][4].However, there are a few papers in the literature focusing on the integration of ML techniques with fluorescence lifetime images for cancer classification.Lin et al. [5] combined principal component analysis (PCA) with a support-vector classifier (SVC) 1 to classify the spectral data from autofluorescence spectroscopy to distinguish nasopharyngeal carcinomas from normal tissues, yielding over 94% diagnostic accuracy.Majumder et al. [6,7] proposed PCA with SVC and its variants to estimate oral cancer with laser-induced fluorescence spectroscopy, which achieved over 85% scores on independent testing samples.However, all those studies were based on fluorescence intensity images, rather than the lifetime.Chen et al. [8] reported the discrimination of three different types of skin cancer using SVC and fluorescence lifetime imaging endomicroscopy (FLIM) data.Their approach, however, was based on handcrafted features derived from the lifetime reconstruction results.Interestingly, all these studies used SVC with handcrafted features from fluorescence data, implying the prevalence and robustness of SVC.So far, little effort has been made on automatic discrimination of human disease/cancer directly on FLIM images by machine learning.
In this study, we address two fundamental problems: whether fluorescence lifetime data could be directly used for lung cancer classification, and whether MLs are feasible to be employed as automatic detection tools in this context.In this regard, we apply four prevalent ML algorithms, namely K-nearest neighbour (KNN), SVC, neural network (NN), and random forest (RF), to over 20,000 fluorescence lifetime images collected by a fibre-based custom fluorescence lifetime imaging endomicroscopy, with various user-specified configurations.The models are trained and validated using 10-folder cross-validation on the datasets from nine patients and tested on the data from the remaining patient.This procedure is repeated three times to obtain three different sets of metrics.The final AUC score is derived by averaging the obtained metrics to quantify the performance of the models.Scikit-learn [15] is utilised to perform ML-related processing in the present investigation.
The rest of the report is arranged as follows.Section 2 reviews the MLs techniques used in the evaluation.Section 3 introduces the methodology of data collection and image preprocessing.Results are presented in Section 4, followed by the conclusions in Section 5.

Machine Learning Techniques
A KNN classifier is a type of instance-based learning, which does not find out an internal relationship (model) among the given variables, but simply stores the instances of the data, and predicts a query sample according to its nearest neighbours with a certain metric, e.g.Euclidean distance [10,11].Due to its conceptual and computational simplicity, KNN requires the least time and resources for classification but sacrificing prediction accuracy.
As a non-linear discriminative classifier, the SVC was firstly introduced by Boser et al. [12] and has been widely applied for classifying multivariate data since then.The algorithm outputs an optimal hyperplane that maximises the margin between the data belonging to different classes and, therefore, separates the data with the decision boundaries.A significant characteristic of SVC is that it supports different functions, called kernels, to solve the problems with different complexity.For example, non-linearly separated data can be classified by being transformed into a higher-dimension space, so that a linear boundary can be found.
A NN classifier is a computational model inspired by how biological systems process information.Typically, a NN classifier is a linear combination of several non-linear activation functions, which transfers a number of given examples (input layer) into a certain number of outputs (output layer) associated with the number of classes to be separated [10,11].The stacked activation functions, along with relevant parameters, between input and output layers are called hidden layers.Nowadays, NNs have been widely applied to various areas.They achieved huge success thanks to the fast development of computational power, the increment in the number of hidden layers, and the development of diverse architectures, which are referred as deep neural networks [13].In this study, a feed-forward neural network with two hidden layers is used.
As an ensemble learning approach, the RF classifier combines decision tree predictors to vote for the most popular class for a given example [14].A major advantage is that, by randomly sampling many independent decision trees, their individual correlation is minimised, and hence the prediction by each tree is relatively uncorrelated, making the overall prediction less bias.

Image filtering
During the data collection, various measuring conditions were applied, including exposure time, optical wavelength, and lifetime extraction approaches to obtain diverse results rich in spatial and spectral resolution.However, this may introduce some variance in lifetime estimation.For example, lifetime values estimated by rapid lifetime determination (RLD) and the centre-of-mass method (CMM) are significantly different.Therefore, the data for further processing was chosen with exposure time of 6 and 20 ns, excitation bands of 490-570 and 594-764 nm, and RLD.In addition, there are some images with sizes different than 128x128.In order to avoid any artificial errors on the lifetime images during the processing, only the lifetime images with 128x128 resolution were selected.After the selection, there were 10,155 and 11,363 frames of cancer and normal tissues respectively, and each frame contained one intensity and one corresponding lifetime image.

Image pre-processing
The overall schema of the pre-processing includes thresholding, normalisation, and refactoring.As mentioned previously, each frame contains an intensity and lifetime image.Let    = { ,  |,  ∈ {1,2, . . ., }} denote ith frame, a is either it or lf representing the intensity or lifetime image of the frame,  ,  is the pixel at x and y location, and N is the size of the images assuming that all images are square.Accordingly, the thresholding is defined as: where    represents the denoised intensity and lifetime images.Afterwards, the thresholded images are normalised using the dark background and lightfield data collected during the experiments.The normalisation is governed by: where   and   are the dark background and lightfield images relating to the input images    ,    is the normalised intensity and lifetime images, and GB is a 3x3 Gaussian smoothing filter defined by [9]: where σ is the standard deviation of the Gaussian distribution which is empirically set to two in this study.OpenCV library [9] is utilised during the image pre-processing.Eventually, FLIM images are row-wise flattened to 1-dimensional vector as the input to the MLs.

Dimensionality reduction
The dimensions of the lifetime image are 128x128, and the pixel values are utilised for classification, but they are sparse and contain many zero values, including boundary and zero lifetime pixels.Dimensionality reduction is required so that the most important features are retained, in this case, the most important features refer to pixels that "contribute" the most.
The employed technique is PCA, which in principle projects the high-dimension input data into a lower-dimension space so that the variance of the data can be best explained [10,11].
In this study, the percentage of variance explained by the PCA-selected components is set to 95%.After the reduction, the number of features decreases from the number of pixel values in the flattened lifetime image (16,384), to 2100.

Training and validation of machine learning algorithms
There are, in total, 10,155 and 11,363 frames of cancer and normal tissue respectively, and each frame contains a fluorescence intensity and the corresponding lifetime image.For the training, 10-folder cross-validation is applied to optimise the hyperparameters of the ML methods using the data collected on nine patients.After the fine-tuning of the hyperparameters with grid search, the values that achieved the best scores are obtained as listed in Table 1 and will be used for the independent evaluation.Predictions are derived and aggregated to calculate average accuracy and AUC.

Results
Figure 2 illustrates the confusion matrices of the predictions on the testing dataset generated by the MLs, along with the calculated precision and recall.Similar to the AUC scores, KNN performs the worst in terms of deriving precision and recall.SVC and NN have comparable performance for precision and recall, since the incorrect predictions (TN and FP) are close to each other.As far as RF is concerned, it is good at predicting normal tissue with only 414 false predictions, and thus achieves the highest precision among the four ML techniques.However, RF struggles to predict cancerous tissue.
The predictions on the testing sets gathered by the MLs are illustrated in Figure 3, using averaged lifetime against mean intensity.Figure 3 depicts that the estimations generated by SVC and NN are very close for both cancer and normal tissues, whereas RF is more confident in estimating normal tissues rather than the cancerous, which is also reflected by the confusion matrix in Figure 2.

Conclusion
In this study, we reported the feasibility investigation of applying ML techniques to fluorescence to distinguish ex-vivo normal and cancerous human lung with lifetime images.Through the experimental results, we can conclude that the most complex ML algorithms are able to be utilised for FLIM-based lung cancer classification.In addition, pixel lifetime in FLIM images, rather than professional engineered features based on the images, can be applied for the classification.
Despite the encouraging results, there are still some aspects need to be addressed in the future.The input for the MLs was flattened from 2-dimensional lifetime to 1-dimensional vector, and thus the correlations among adjacent pixels were lost.Accordingly, the classification at the image level, e.g. using deep convolutional neural networks [13], could be a promising direction to further improve the prediction accuracy.In addition, enhancing the quality of the images with higher resolution and more spectral information would be helpful to make the predictions more robust.

Figure 1 :Figure 2 :Figure 3 :Figure 1
Figure 1: ROC curves and AUC scores reached by the ML techniques on the testing datasets

Table 1 :
Hyperparameters of the MLs and fine-tuned results