Extracting Time-Frequency Features of Images for Robust LSTM-based Classiﬁcation of H&E Stained Tissue

The importance of automated classiﬁcation of histopathological images has been increasingly recognized for eﬀective processing of large volumes of data in the era of digital pathology for new discovery of disease mechanism. This paper presents a deep-learning approach that extracts time-frequency features of H&E stained tissue images for classiﬁcation by long short-term memory networks. Using two large public databases of colorectal-cancer and heart-failure H&E stained tissue images, the proposed approach outperforms several state-of-the-art benchmark classiﬁcation methods, including support vector machines and convolutional neural networks in terms of several statistical measures. of histopathological images include the use of pretrained convolutional neural networks (CNNs), which are ResNet-50 and DenseNet-161, for image classiﬁcation (Talo, 2019) using a public dataset of 24 whole-slide tissue images of diﬀerent tissue textures, achieving an average classiﬁcation accuracy of 91%; and another application of CNNs for large-scale tissue histopathology image classiﬁcation (Xu et al ., 2017). The latter work used a CNN model to extract features of the histopathological images obtained from brain-tumor and colon-cancer histology datasets, which were then used for training the linear support vector machines (SVM) for the task of classiﬁcation (CNN-SVM). The accuracy rates achieved by this CNN-SVM method were about 98% for the brain-tumor classiﬁcation, and 90% and 80% for the 2-class and multi-class classiﬁcations of the colon-cancer data, respectively. Another similar CNN-SVM-based method was developed with the use CNNs for image feature extraction and SVM for classifying breast-cancer H&E stained images (Araujo et al ., 2017), where 2-class and 4-class classiﬁcations were carried out. The 2 classes are carcinoma and non-carcinoma, and the 4 classes are normal tissue, benign lesion, in situ carcinoma and invasive carcinoma. Accuracy rates obtained from this CNN-SVM model were 83% and 78% for the 2-class and 4-class classiﬁcations, respectively. Other two AI-based methods, where the same datasets were used in this present study for comparisons of classiﬁcation results, were reported in (Kather et al ., 2016a; Nirschl et al ., 2018) and will be discussed in subsequent sections.


Introduction
Understanding the molecular information and spatial distribution of cells in complex human tissues is critical for gaining insights into disease mechanism (Riordan et al., 2015). Currently wide practice in the interpretation of images in digital pathology involves the manual analysis and consensus of several experienced pathologists (Riordan et al., 2015;Aeffner et al., 2019). Such subjective human-based interpretation of images with complex features would give rise to errors and difficulty in the reproducibility of the results. In particular, with the emerging concept of precision medicine, where the role of big data in digital pathology is essential for achieving the aim, precise automated image analysis and classification tools are ultimately needed.
It is reported that image classification methods developed for applications in radiology have been found inadequate for the analysis of high-resolution whole-slide images, and therefore there is a strong demand to call for developments of more effective tools for automated detection, segmentation, feature extraction, and tissue classification in digital pathology (Madabhushi & Lee, 2016).
Deep learning (DL) in artificial intelligence (AI) has recently become most promising methods for computerized image information processing in histopathology. Some recent DL-based developments in automated classification of histopathological images include the use of pretrained convolutional neural networks (CNNs), which are ResNet-50 and DenseNet-161, for image classification (Talo, 2019) using a public dataset of 24 whole-slide tissue images of different tissue textures, achieving an average classification accuracy of 91%; and another application of CNNs for large-scale tissue histopathology image classification (Xu et al., 2017). The latter work used a CNN model to extract features of the histopathological images obtained from brain-tumor and colon-cancer histology datasets, which were then used for training the linear support vector machines (SVM) for the task of classification (CNN-SVM). The accuracy rates achieved by this CNN-SVM method were about 98% for the brain-tumor classification, and 90% and 80% for the 2-class and multi-class classifications of the colon-cancer data, respectively. Another similar CNN-SVM-based method was developed with the use CNNs for image feature extraction and SVM for classifying breast-cancer H&E stained images (Araujo et al., 2017), where 2-class and 4-class classifications were carried out. The 2 classes are carcinoma and non-carcinoma, and the 4 classes are normal tissue, benign lesion, in situ carcinoma and invasive carcinoma. Accuracy rates obtained from this CNN-SVM model were 83% and 78% for the 2-class and 4-class classifications, respectively. Other two AI-based methods, where the same datasets were used in this present study for comparisons of classification results, were reported in (Kather et al., 2016a;Nirschl et al., 2018) and will be discussed in subsequent sections.
In this study, the original concept of transforming spatial data (2D images) into time series for the extraction time-frequency features by means of the instantaneous-frequency and spectral-entropy methods as input features for classifying histopathological images by long short-term memory (LSTM) networks is introduced. Time-frequency analysis explores hidden features of a signal in both the time and frequency domains simultaneously. The main motivation for the time-frequency analysis in signal processing is that signals are assumed to be infinite in time or periodic by the classical Fourier transform, but this assumption may not be true for many types of signals. Because the instantaneous frequency and spectral entropy can capture the nonstationary properties of signals in practice, the methods have been applied to many areas of research, including biomedical applications (Chang et al., 2014;Akbar et al., 2016;Yin et al., 2017). For time series or sequential data classification, recurrent neural networks, including LSTM models in DL (Hochreiter & Schmidhuber, 1997), have been reported to achieve promising results in many applications such as biomedical named entity recognition (Lyu et al., 2017), clinical care (Che et al., 2018), and flood forecasting (Le et al., 2019).
Although LSTM can directly handle sequential data, the direct input of sequential data into this network may have poor effects on the task of classification in several applications (Pham et al., 2019a), and the networks require a lot of resources for training and encounter difficulties for training (Pascanu et al., 2013;Culurciello, 2018;Sherstinsky, 2020). The approach addressed in this study attempts to replace the direct input of raw vectorized data of pathological images with time-frequency features to an LSTM network, namely TF-LSTM, which can significantly improve the network training and yield high classification accuracy. The rest of this paper is organized as follows. Section 2 describes two public H&E datasets of colorectal cancer and heart failure tissues, time-frequency analysis of images, and LSTM architecture. Section 3 presents classification results obtained from the proposed TF-LSTM and comparisons of the results with those from benchmark methods. Section 4 gives insights into the analysis of the results, training processes of the TF-LSTM, and further comparisons with related studies. Finally, Section 5 provides concluding remarks of the findings.

Colorectal-cancer histology data
The classification of colorectal-cancer histology data used in this study were originally carried out by Kather et al., 2016a and the data of the histological images of human colorectal cancer are publicly available at (Kather et al., 2016b). The acquisition of the data was performed as follows.
Ten anonymized H&E stained colorectal-cancer tissue slides were obtained from the pathology archive at the University Medical Center Mannheim (Heidelberg University, Mannheim, Germany). Tumors of both low grade and high grade were included in the dataset. The slides were first digitized, then contiguous tissue areas were manually annotated and tessellated to produce 625 non-overlapping tissue images of 150 × 150 pixels for each of 8 types of tissue, resulting in a set of 5000 images. The 8 tissue types for classification are: 1) tumor epithelium (tumor), 2) simple stroma (stroma), 3) complex stroma (complex), 4) immune cells (lymphoid), 5) debris, 6) normal mucosal glands (mucosa) 7) adipose tissue (adipose), and 8) background (no tissue or empty). Figure 1 shows an example of the H&E stained colorectal-cancer images of the eight tissue types included in the dataset.

Heart-failure histology data
The H&E stained heart-failure tissue image dataset, which includes left ventricular tissue from 209 patients, collected at the University of Pennsylvania between 2008 and 2013, was originally investigated by Nirschl et al., 2018 and can be publicly downloaded at the following URL: https://idr.openmicroscopy.org/ webclient. Two cohorts of patients were included in the dataset: 1) heart failure (N = 94) and 2) without heart failure (N = 115). The heart-failure tissue was collected from patients with clinically diagnosed Figure 1: Images of eight H&E-stained tissue types in colorectal cancer, from left to right, in 1st row: adipose, complex, empty, and debris, and 2nd row: lymphoid, mucosa, stroma, and tumor.
ischemic cardiomyopathy (N = 51) or idiopathic dilated cardiomyopathy (N = 43). The non-failing patients were organ donors without a history of heart failure. All tissue types were sectioned, stained, and scanned during the data acquisition. The whole slide image of each patient was down sampled to 5x magnification, where eleven non-overlapping images considered as the regions of interest were extracted and the tissue border was manually refined. The total number of images for the heart-failure and non-heart-failure cohorts are 1034 and 1265, respectively, resulting in a dataset of 2299 images of 250 × 250 pixels. Figure 2 shows 8 images of H&E-stained sections of heart failure and without heart-failure tissue types.

Time-frequency analysis of images
The vectorization of a 2D image or matrix is a linear transformation that converts the image into a column vector. Specifically, the vectorization of an M × N image I, denoted as vec (I) (1) Thus, vec(I) results in a time series of image I and can be used for the extraction of time-frequency features, which are the instantaneous frequency and spectral entropy adopted in this study.

Instantaneous frequency
The instantaneous frequency (IF) is an important feature of a non-stationary signal. The IF considers the signal spectral variations as a function of time (El-Jaroudi & Emresoy, 2003;Boashash, 1992a,b). From the view point of the time-frequency distribution, the IF of a signal at time t is defined as the weighted average of the frequencies in the signal at time t, which is mathematically expressed as (Morelande et al., 2000) Figure 2: Images of H&E-stained tissue sections of heart-failure (1st row), and non-heart-failure conditions (2nd row).f ( whereP (t, f ) is the time-frequency distribution estimate or the spectrogram power spectrum of the input signal x(t).
The estimation of the IF can be carried out using an iterative algorithm that uses the spectrogram to obtain an IF estimate using Equation (2), then applies the initial IF estimate to re-estimate the spectrogram (El-Jaroudi & Emresoy, 2003). This algorithm is described as follows.
Iterative algorithm for estimating IF 1. Assuming that the signal is of the form: where A(t) and φ(t) are the time-varying amplitude and the phase of signal x(t).

Calculate the spectrogramP (t, f ) of x(t)
3. Estimate the IF,f i (t) as the first moment of the power spectrogram of the input signal at time i and the phase:f 4. Demodulate the signal along the estimated IF: 5. Estimate a new matched spectrogram: 6. SetP i (t, f ) =P i+1 (t, f ), return to step 3, and stop if the IF estimate converges or the difference between two consecutive iterations is within a stopping threshold.

Spectral entropy
Based on the concept of the Shannon entropy, spectral entropy of a signal is a measure of the randomness of its spectral power distribution at a given time (Pan et al., 2009). The greater the spectral entropy is, the more uniform distribution the power spectral density has. Given a time-frequency distribution estimate (spectrogram)P (t, f ), the probability distribution at time t and frequency point m, p(t, m), can be computed as The spectral entropy is defined as the Shannon entropy of the normalized power distribution of the signal in the frequency domain at time t, denoted as H(t), and mathematically expressed by where N is the total number of frequency points.

Long short-term memory (LSTM)
The architecture for an LSTM block is the flow of an input sequence u of length N through an LSTM layer. The learnable weights of an LSTM layer are the input weights, denoted as a, recurrent weights, denoted as r, and bias, denoted as b. The matrices A, R, and vector b are the concatenations of the input weights, recurrent weights, and bias of each component, respectively. The concatenations are expressed as where i, f , g, and o denote the input gate, forget gate, cell candidate, and output gate, respectively. The cell state at time step t is defined as where • is the Hadamard product.
The hidden state at time step t is given by where σ c is the state activation function that is usually computed as the hyperbolic tangent function (tanh). At time step t, the input gate (i t ), forget gate (f t ), cell candidate (g t ), and output gate (o t ) are defined as where σ g denotes the gate activation function that usually adopts the sigmoid function. A bidirectional LSTM (bi-LSTM) (Schuster & Paliwal, 1997) is an extension of traditional LSTM that can improve performance on sequence classification problems. Instead of being trained with one LSTM on the input time series, a bi-LSTM architecture is trained with both time directions simultaneously with hidden forward and backward layers. The first on the input time series as it is and the second on a reversed copy of the time series. This architecture learns bidirectional long-term dependencies between time steps of time series and therefore can provide additional context to the network and result in fuller learning on the data.

Results
The bi-LSTM network was used in this study. The network layer with an output size = 100, fully connected layer = 2 for the two classes and 8 for the eight classes, followed by a softmax layer and a classification layer. Training options of the bi-LSTM were set as optimizer = 'Adam' (adaptive moment estimation), including L 2 regularization factor, maximum number of epochs = 80 for the two-class classification of the colorectal-cancer and heart-failure histopathological data and 180 for the eight-class classification of the colorectal-cancer histopathological data, minimum batch size = 150, initial learning rate = 0.01, and gradient threshold = 1. To compute the instantaneous frequency and spectral entropy, the sampling frequency f s = 300 Hz, and frequency range = [0, f s/2]. Figure 3 shows the plots of the signals transformed from the H&E images of the stroma and tumor tissue types shown in Figure 1, and the non-heart-failure and heart-failure tissue types shown in the first left images of Figure 2, the estimated instantaneous frequencies of the signals of the H&E images of the stroma, tumor, non-heart-failure, and heart-failure tissue types as the first moments of the power spectrograms, which are overlaid with the power spectrograms, and the spectral entropy of the signals of the H&E images of the stroma, tumor, non-heart-failure, and heart-failure tissue types. Table 1 shows the average classification accuracy based on ten random runs of the 10-fold cross-validation. To compare with published benchmark results, two classification tasks were carried out: 1) two-class classification: stroma and tumor, and 2) eight-class classification: adipose, complex, debris, empty, lymphoid, mucosa, stroma, and tumor.

Colorectal-cancer histology
For the 2-class classification, results obtained from the time-frequency based LSTM (TF-LSTM) is 100% for average accuracy (ACC), sensitivity (SEN), and specificity (SPE). For the 2-class classification, sensitivity is defined as the percentage of tumor tissue images that are correctly identified as tumor tissue images, and specificity is the percentage of stroma tissue images that are correctly identified as stroma tissue images. The results (Table 1) obtained from TF-LSTM are higher than those previously obtained from two reports with the use of support vector machines trained with lower-order and higher-order histogram features (SVM-histogram) (Kather et al., 2016a) and features extracted by the local binary patterns (SVM-LPB) (Kather et al., 2016a), and three pre-trained convolutional neural networks (AlexNet, VGG-16, and ResNet-50) (Pham et al., 2019b).  Figure  1, non-heart-failure (g) and heart-failure (j) tissue images shown in the first two left images of Figure 2, instantaneous frequency estimates of stroma (b) and tumor (e), non-heart-failure (h) and heart-failure (k), and spectral entropy of stroma (c) and tumor (f), and non-heart-failure (i) and heart-failure (l). 3.2 Heart-failure histology Table 2 shows the average statistical classification measures in terms of accuracy (ACC), sensitivity (SEN) that is the measure of heart-failure condition correctly identified as such, specificity (SPE) that is the measure of non-heart-failure condition correctly identified as such, and the receiver-operating-characteristic (ROC) area under curve (AUC) obtained from the random forest (RF) (Nirschl et al., 2018), a convolutional neural network (CNN) (Nirschl et al., 2018) with the architecture modified from the work of Janowczyk and Madabhushi, 2016, and the TF-LSTM developed in this study. To compare with the results obtained from the TF-LSTM with benchmark results, the classification tasks were performed by randomly carrying out 10 runs the 3-fold and 5-fold cross-validations of the dataset. For the 3-fold cross-validation, the classification results obtained from the TF-LSTM are the highest (ACC = 100%, SEN = 100%, SPE = 100 %, and AUC = 1) among the other two methods (RF and CNN). For the 2-fold cross-validation, the classification results obtained from the TF-LSTM (ACC = 99.52%, SEN = 99.24%, SPE = 100 %), and AUC = 1) are also the highest among RF and CNN.

Discussion
For the 8-class classification of the colorectal-cancer histology data, all images mis-classified by the TF-LSTM were the adipose-tissue images mis-classified as the empty-tissue images, and all images of other tissue types were correctly identified by the TF-LSTM. This observation has an implication that the identification of the tumor tissue by the TF-LSTM is highly reliable in performing the multi-class analysis of colorectal cancer histology.
As shown in Figure 4, the LSTM-network training of the time-frequency features of the images for the 2-class classification (Figure 4(a)) converged much quicker than the training for the 8-class classification (Figure 4(b)). This observation also suggests the effectiveness of the LSTM training of the time-frequency features of the histology images and that a smaller number of epochs can be used to shorten the training Table 1: Average classification results based on 10-fold cross-validations using colorectal-cancer histology data, where n/m = "not mentioned" and n/a = "not applicable".
The classification of heart-failure histology data reported in (Nirschl et al., 2018) directly trained a CNN with the histology images resized to 64×64 patches as input into the network that was trained using 100 patches per region of interest for each patient, and the training data were augmented by rotating each image patch by 90 degrees. By using the transformed features and applying the LSTM, the TF-LSTM network not only achieved better average accuracy results for both 3-fold and 2-fold cross-validations than the CNN, but the sensitivity rates (correct classification of heart-failure images) obtained from the TF-LSTM (100% and 99.24% for 3-fold and 2-fold cross-validations, respectively) were also higher than those obtained from the CNN (97.10% and 98.50% for 3-fold and 2-fold cross-validations, respectively). Average sensitivity results provided by the TF-LSTM are also higher than those obtained from the RF (88.10% and 90.90% for 3-fold and 2-fold cross-validations, respectively).

Conclusion
The extraction of sequential time-frequency features from histopathological images presented in this paper appears to be effective for the learning of LSTM networks, which are capable of processing information with long short-term dependencies. The time-frequency features of images inherently capture both spatial and temporal properties of an image to enhance to power of deep recurrent neural networks. The proposed approach was applied to the classification of colorectal-cancer and heart-failure H&E stained images using two corresponding large public datasets. Extension of the TF-LSTM for classifying other histopathological data is straightforward. Comparative statistical classification measures show the usefulness of the TF-LSTM that can significantly contribute to automated and precise analysis of big data in digital pathology to allow timely clinical decision making, prediction and prognosis of patient outcome, and biomarker discovery (Madabhushi & Lee, 2016;Barisoni et al., 2020).