A Systematic Approach to Identify the Breast Cancer Grades in Histopathological Images Using Deep Neural Networks

The main intent of the research is to develop an automated application that can determine the Nottingham Histologic Score of a given input histopathological image obtained from breast cancer or healthy tissues with DenseNet based architecture. In this study, we were able to obtain more than 94% accuracy rates for each trained model including 2-predict, 3-predict, and 4-predict networks.


I. INTRODUCTION
Breast cancer is identified as the second most common cancer and this holds the highest mortality rate due to cancers among women worldwide. As per the UIUC organization, 2.3 million new cases were reported in 2020 [34]. Once the breast cancer cells are formed, that will lead to the spread of malignant cells to other parts of the body by making it life-threatening. Breast cancers are often found earlier when they are small and before they spread. Due to technical advancement, early detection, and procedures, the survival rate of breast cancer patients has significantly improved across the world within the last few years.
Breast cancer grade classification is a labor-intensive task and needs human experts and time to do the diagnoses. The grade of breast cancer tissue is calculated according to the way that cancer cells look under the microscope. For the laboratory breast cancer grade analysis purpose, it is best known that specimens of the tissues with the affected areas are used. Histopathological images are extracted as a result of this. It is known that histopathological images can be extracted with different High-Power Field (HPF) values. But images with 40X HPF values are well defined and illustrate the key features. The classified grade is used to figure out what treatments work best.
Densely Connected Convolutional Network (DenseNet) is one of the discoveries in deep neural networks for visual object recognition. Deep Convolutional Neural Networks (DCNN) with DenseNet architecture can be identified as one of the most dominant and powerful deep learning approaches involved in the analysis of visual imagination that provide excellent performance in medical imaging including breast cancer detection, classification, and segmentation with higher accuracy rates of more than 90% [32] [33].
This research aims to introduce a new computer-aided diagnosis approach that can continue the existing laboratory grading procedure with a higher accuracy rate.

II. OBJECTIVE
The main intent of the research is to develop an application that can determine the grade of a given histopathological image of breast cancer or healthy tissue as 0-benign, 1, 2, or 3 using a DenseNet based DCNN.

III. RELATED WORKS
This explains the background and existing solutions related to the domain of breast cancer grade classification and past works related to transfer learning with DenseNet architecture in medical image analysis.

A. Breast Cancer Grade Classification
Cancers occur due to abnormal changes or mutations in the genes that are responsible for conducting the usual activities of the human body. As a result of this, cancer cells or malignant cells are formed. These created malignant cells can continue spreading without any control and this will lead to spreading cancer to other parts of the body. This indicates that breast cancer is caused due to uncontrolled growth of malignant cells inside the human body. Grading is a measurement that can be used to define the aggressiveness of breast cancer. There are different "scoring systems" introduced to ascertain the grade of breast cancer. The Nottingham Histologic Scoring system is considered one of the best scoring systems among other scoring techniques [12]. Here the following features are considered by pathologists when determining the breast cancer grade; 1. The amount of gland formation 2. The nuclear features 3. The mitotic count Above each feature is scored from 1-3, and these scores are added to obtain the final grade that the total score ranges between 3-9. The final total score is determined as follows; 1. Grade 1 tumors hold a total score of 3-5 2. Grade 2 tumors hold a total score of 6-7 3. Grade 3 tumors hold a total score of 8-9 H. Peiris et al. clarified the impact of Nottingham grade on breast cancer-specific survival and recurrence-free survival of operable breast cancer patients by evaluating the value of Nottingham grade in the Sri Lankan setting [20]. They have used Kaplan-Meier and Cox regression models for survival analysis. The results they have obtained were a total of 742 (grade 1-12%, grade 2-45%, grade 3-43%) patients, and their breast cancer-specific survival was 94% for grade 1, 80% for grade 2, and 72% for grade 3 with p-value<0.001 and also their recurrence-free survival was 86% for grade 1, 75%for grade 2, and 67% for grade 3 with p-value=0.001. Pathological prognostic factors in breast cancer, I. C. W. Elston et al. described that, since 1973 in the United Kingdom, histological grades were evaluated from 1831 breast cancer patients [21]. Here patients with grade 1 tumors had significantly better survival than those with grade 2 and 3 tumors with a p-value<0.0001. S. Pal et al. analyzed the grade of breast carcinoma on cytology by using Robinson's grading system and correlating it with Elston's modified Bloom Richardson histological grading system [30]. They have achieved a concordance rate between cytological and histological grades of around 78%. According to the results, they were able to obtain a coefficient of correlation between cytological and histological grades of around 0.804 with a p-value<0.001.

B. Transfer Learning with DenseNet Architecture in Medical Image Analysis
This includes some of the past research works that have been conducted involving transfer learning with DenseNet architecture in medical image analysis.
M. Talo has introduced two approaches with pre-trained ResNet-50 and DenseNet161 architectures to classify color and gray-scale histopathological images [23]. They have achieved more than 97% accuracy for both pre-trained models. X. Wu et al. introduced an approach that is applied to extract gait features obtained from Gait Energy Images [24]. They have extracted gait features through DenseNet based transfer learning and K nearest neighbor classifier (KNN) to classify and identify people. The average recognition rate was around 98%. Q. Cai et al. proposed a deep neural network called SE-DenseNet by combining both DenseNet architecture and the Squeeze-and-Excitation block [25]. They have used a twice fine-tuning method to classify the breast mass. The introduced model achieved higher performance on the BCDR dataset with an accuracy rate of around 98%. S. Minaee et al. introduced an approach to identify COVID-19 disease by applying transfer learning on ResNet18, ResNet50, SqueezeNet, and DenseNet-121 pre-trained architectures [26]. To continue this, they have reserved 2000 chest X-ray images for the training purpose and 3000 chest X-ray images for the testing purpose. During the evaluation process, they were able to obtain a sensitivity rate of 98% (± 3%) and a specificity rate of around 90% for most of the models. X. Xu et al. researched to analyze fundus images in two phases, training the DenseNet model from scratch, and in the second phase, they have applied transfer learning to obtain a fine-tuned network [27]. Y. Celik et al. presented a transfer learning approach for the detection of invasive ductal carcinoma by training both ResNet-50 and DenseNet-161 pre-trained architectures [28]. At the evaluation process, they achieved an F-score of 92.38%, a balanced accuracy value of 91.57% for the DenseNet-161 model, and an F-score of 94.11%, a balanced accuracy value of 90.96% for the ResNet-50 architecture. F. Imrie et al. combined DenseNet pre-trained architecture with a transfer learning approach that was used to create a composition of protein family-specific models [29]. S. H. Wang and Y. D. Zhang introduced a new transfer learning approach by comparing and parameter tuning for all DenseNet 121, 169, and 201 architectures for multiple sclerosis classification [31]. They observed that DenseNet201-D architecture achieved the best performance with 98.27± 0.58 sensitivity, 98.35± 0.69 specificities, and 98.31± 0.53 accuracy.

C. Breast Cancer Grade Classification in Medical Image Analysis
Some of the previous works related to the objectives are defined here. Jian et al. mainly applied several classification techniques to distinguish mitotic cells from healthy normal cells [7]. S. Rao has introduced a method involving a region-based convolutional neural network to identify the mitotic elements in histopathological images [8]. A. Paul et al. presented an approach to detect mitotic elements from histopathological images whereas applying cell segmentation and classification of segmented cells as mitotic and amitotic using the random forest classifier [22].
After analyzing the previous works, the main contribution of the proposed solution is to fill the knowledge gap of the existing systems with a novel approach.

IV. METHODOLOGY
Below mentioned are the major tasks that had to be carried out during our research work.

• Data preprocessing
To complete the data preprocessing task, several mechanisms had to be applied and they are discussed under data preprocessing.

• Implementation of the inference tool
To visualize the results obtained as well as to make the models available for use, an inference tool was built.

A. Data Preprocessing
Initially, we received a graded (1, 2, and 3) dataset [11] with different HPF values (4X, 10X, 20X, 40X), resolutions, and image formats. To complete the data preprocessing task, we used the image dataset with 40X HPF value and increased the received datasets by applying several data augmentation techniques including image cropping, rotating, etc. Later we were able to create an RGB image dataset for each grade with resolution 700*128*128*3. Again, we have used another dataset [14] to train the 2-predict model and after applying preprocessing techniques on the received benign 40X HPF dataset, we have created a dataset with image resolution 700*128*128*3. For 2predict model implementation, the malignant dataset was created by applying the random sampling technique on the dataset of the 3-predict model. Dataset of the 4-predict model was created by combining both the benign image dataset of the 2-predict model and the dataset of the 3-predict model. Table 1 defines the resolutions of the training and testing datasets used in each model.

B. Model Building and Evaluation
To build each model, the following key aspects were considered; Grade classification was carried out by considering two scenarios. In the first scenario, a single neural network model was implemented to classify the histopathological images as benign-0, grade 1, grade 2, and grade 3. In the second scenario, two deep neural network models were trained to achieve the same objective as in the first scenario whereas implementing the initial model to classify the histopathological image as benign or malignant and at the second step, to classify the grade based on the prediction obtained at the first step. As in Fig. 1, the initial model classifies the input histopathological image as benign-0 or malignant-1 and if it is malignant, obtains the prediction from the second model as 1, 2, or 3 otherwise the grade will be remaining as zero. As per the research objectives that were mentioned in previous sections, this research is targeted at building a DCNN with transfer learning to predict the grades (0-benign, 1, 2, and 3) of histopathological images. Below define the architectures of both classical, transfer learning models, and the process of the evaluation carried out to achieve the above-defined goal.

Classical DCNN Model Implementation
This was applied to implement the three predict model for 128*128 gray-scaled image dataset and achieved a less amount of test accuracy rate. Due to the lack of a higher accuracy rate and the other corresponding results obtained, we decided to implement the model involving the transfer learning technique. The classical DCNN model is a composition of,

Model Implementation with Transfer Learning
This was applied to implement 2-predict, 3-predict, and 4-predict models for 128*128 RGB image datasets and achieved better test accuracy rates than the classical model. This model is a composition of,   Apart from that, a Flask Application Program Interface with Anaconda framework was developed to obtain the prediction by considering each case.

Evaluation
This was performed on the reserved test datasets of each model. For each model, there were around 14% of allocated histopathological images for testing purposes out of all images.
The evaluation was carried out by changing parameters including batch size, dropout probability, and the number of epochs. But, all four models provided significant responses only for the varied number of epochs rather than batch size and dropout probability. Mainly we have performed evaluation operations on test datasets including test accuracy rate calculation, confusion matrix illustration, and classification reports.

C. Implementation of the Inference Tool
This was carried out to visualize the results obtained and to make the trained models available for use. The inference tool was implemented as a desktop graphical user interface application and has added some basic graphical user interface functions such as uploading an image, viewing results, saving an image with the classification, searching a record by ID, deleting a record by ID, converting a record into PDF format with a suitable database connection.
The summarized architecture of the proposed solution is illustrated in Fig. 4. This was implemented with Apache NetBeans with Maven dependencies and MongoDB was used as the database. Apart from that, an Application Program Interface integration was carried out to obtain the predictions by considering each case as 1and 2.

V. RESULTS AND DISCUSSION
To analyze the results generated by each model, the main method that followed was the test data-based results generation. The following includes the results obtained and the comparison between the obtained results.     shows the classification report received by applying transfer learning from the testing dataset for the two-predict model with 20 epochs. According to this result, precision, recall, and F1-score have obtained the highest values as can be obtained. Also, the accuracy received for the test dataset is 100%.         11. shows the classification report received by applying transfer learning from the testing dataset for the three-predict model with 30 epochs. As this was trained for 20 epochs previously, the trained model with 30 epochs holds maximum values for precision, recall, and F1-score. The test accuracy rates of trained models with 20 and 30 epochs are 94.33% and 94.99% respectively.       According to the results obtained,

C. Four-Predict Model
• Two-predict model: When Fig. 5 is considered, we could see that the graph is getting converged whereas increasing the number of epochs. Fig. 6 and 7 also evidently express the appropriateness of the model by providing a 100% accuracy rate for the test dataset. • Three-predict model: Fig. 8 shows that the proposed classical DCNN model is not enough to interpret the training dataset as expected. When Fig. 9 is considered, we could see that we can obtain a good fit by increasing the number of epochs up to 30. Fig. 10 and 11 also provide some evidence, providing a 94.99% accuracy rate for the test dataset for 30 epochs. • Four-predict model: By considering Fig. 12, we can conclude that we can increase the fitness of the model by increasing the number of epochs up to 25. Fig. 13 and 14 also show a higher test data accuracy rate around 95.249% for 25 epochs. • Model type (case 1 and 2) comparison: According to Fig. 15, can conclude that both approaches are good enough, and also both are applicable for the desktop graphical user interface prediction purpose.
By referring to the above results, we can conclude that it is appropriate to train the two-predict model up to 20 epochs, the three-predict model up to 30 epochs, and the four-predict model up to 25 epochs. Furthermore, both model types (case 1 and 2) are applicable for the desktop graphical user interface prediction purpose.

VI. CONCLUSION In this research,
• Three DCNN based models with transfer learning have been implemented to classify the grades of breast cancer. • These DCNN models have been trained and tested with histopathological image datasets with the 40X HPF value. • 2-predict, 3-predict, and 4-predicts models were able to obtain test accuracy rates of 100%, 94.999%, and 95.2499% respectively. • This research confirms the successfulness of using transfer learning with DenseNet architecture whereas providing test accuracy rates of more than 94% for all three trained models. • Moreover, a desktop application was developed to infer the solution.
Ultimately, accuracy improvements, implementation of the developed system as a web-based application could be identified as works that could be carried out as further improvements of this research.