3D Quantum-inspired Self-supervised Tensor Network for Volumetric Segmentation of Brain MR Images

This paper introduces a novel shallow self-supervised tensor neural network for volumetric segmentation of brain MR images obviating training or supervision. The proposed network is a 3D version of the Quantum-Inspired Self Supervised Neural Network (QIS-Net) architecture and is referred to as 3D Quantum-inspired Self-supervised Tensor Neural Network (3D-QNet). The underlying architecture of 3D-QNet is composed of a trinity of volumetric layers viz. input, intermediate and output layers inter-connected using a 26-connected third-order neighborhood-based topology for voxel-wise processing of 3D MR image data suitable for semantic segmentation. Each of the volumetric layers contains quantum neurons designated by qubits or quantum bits. The incorporationof tensor decomposition in quantum formalism leads to faster convergence of the network operations to preclude the inherent slow convergence problems faced by the self-supervised networks. The segmented volumes are obtained once the network converges. The suggested 3D-QNet is tailored and tested on the BRATS 2019 data set extensively in the experiments carried out. 3D-QNet has achieved promising dice similarity while compared with the intensively supervised convolutional network-based models 3D-UNet, Vox-ResNet, DRINet, and 3D-ESPNet, thus facilitating annotation free semantic segmentation using a self-supervised shallow network.


I. INTRODUCTION
A utomatic volumetric brain Magnetic Resonance (MR) image segmentation assisted by contextual information yields Volumes of Interest (VOIs), which are critical to glioma patients. Deeply supervised Convolutional Neural Networks (CNN) have achieved respectable accuracy in 2D medical image segmentation [1]. However, in automatic 3D MR image data segmentation, deeply supervised 3D-CNNs suffer from manually affected challenges viz. acquiring sufficient 3D annotated data for suitable training, high heterogeneity and dimensionality of 3D MR images, complex anatomical environments and the need for optimizing the 3D neural networks [2], [3]. Hence, structural neuro-imaging research calls for self-supervised learning for accurate and fast segmentation of brain images with different modalities. Given 3D MR image data, the primary aim of our proposed 3D-QNet architecture is to perform volumetric segmentation for brain tumor identification alleviating supervision or training. Our proposed 3D-QNet architecture is centered on the self-supervised bi-directional counter propagation of the quantum states obviating the time-intensive quantum backpropagation algorithm for faster convergence. The network hyper-parameters associated with the gray-level thresholding are adaptive in nature, and voxel-wise context-sensitive information is exhibited in quantum formalism as reported in this article. The current voxel-wise segmentation work has significant contributions over 2D brain image segmentation [4]- [6] as given below. 1) We propose a shallow voxel-wise quantum-inspired selfsupervised neural network referred to as 3D-QNet which has significant relevance on volumetric MR image segmentation. 2) In this work, 26-connected quantum fuzzy contextsensitive voxel information is processed to integrate the appearance of low-level and high-level local image features with wide intensity variations and implicit shape of the VOIs, thereby enabling accurate volumetric segmentation of 3D MR images. 3) A novel generalized quantum-inspired self-supervised learning is proposed using a tensor representation of weight vector for high dimensional data and employed in our suggested 3D-QNet for brain MR image segmentation. 4) The convergence analysis of the proposed 3D-QNet is also demonstrated with super-linearity. The primary aim at incorporating quantum computing in our proposed 3D network architecture is to exploit the features of quantum correlation and to accelerate the speed of convergence of the network operation, simultaneously improving the discrimination ability to yield fast and accurate segmentation.
The organization of the remaining sections of the manuscript is as follows: a comprehensive literature review about various deep learning-based volumetric segmentation of medical images and the challenges are presented in Section II. Section III illustrates the fundamental concepts of quantum computing. The novel self-supervised 3D-QNet architecture with quantum-inspired tensor network model for voxel-wise segmentation of three dimensional MR images is introduced in Section IV. The experimental outcome and discussions are provided in Section V. Section VI states the concluding remarks of the proposed work and sheds light into the future directions of research.

II. RELATED WORKS
Recent years have witnessed a surge in the application of deep learning networks in brain lesion segmentation [7]- [10]. However, in contrast to automated volumetric segmentation of brain MR images, 2D convolutional neural network architectures (CNNs) [7], [9], [10] process the MR images in 2D independent slice-wise fashion which leads to non-optimal use of 3D contextual feature information of volumetric MR image data. In turn, 3D CNN based architectures extract rich spatial and contextual features and perform voxel-wise segmentation of volumetric brain MR images [11]- [13]. Kamnitas et al. [11] suggested a dual path 3D CNN incorporating local and larger contextual feature information to obviate the computationally complex 3D MR image processing and to exhibit dense inference on medical image segmentation. A flexible network, 3D-UNet architecture [13] achieved remarkable success on brain MR image semantic segmentation. Of late, to exploit the 3D contextual information, Brebisson et al. [14] employed 2D CNNs on three orthogonal 2D patches and formed 3D patches in combination to reduce the memory requirements. However, 3D CNN networks suffer from slow convergence problems owing to computationally exhaustive 3D convolution operations and extensive training procedures. However, despite popularity among medical and computer vision researchers, U-Net architectures [13] fall short in scalability and are unable to distinguish the distinctive features (shape, size, intensity, location etc.) learned at the convolutional layers. Moreover, it suffers from the vanishing gradient problem when the number of feature layers is increased for better representation of the features. Various deeper network architectures obviating the vanishing gradient problem have been proposed concurrently for voxel-wise medical image segmentation including VoxRes-Net [15], DRINet [16] and 3D-ESPNet [17]. However, these deeply supervised network architectures suffer in computational complexity and slow-convergence with an increase in the number of feature layers in the network architecture. Currently, self-supervised/semi-supervised/weakly supervised networks have gained significant attention among computer vision and medical research community due to lack of annotated images for deep supervision [18], [19]. Nevertheless, these self-supervised networks [18], [19] for volumetric brain segmentation rely on pre-trained 3D CNN models, and hence these are not fully self-supervised networks. It inspires us to develop 3D self-supervised neural network architectures for volumetric brain segmentation.
The main problem with the classical self-supervised neural network models lies in the fact that they do not converge fast and hence the segmented outcome is distorted due to slow convergence problems [20]- [22]. Numerous quantum neural networks have been evolved in the last few decades replicating classical neural networks and offering faster processing while compared with the classical counterparts [23]- [28]. The quantum version of the classical self-supervised neural network architectures [29]- [32] offer a potential solution for faster and efficient image segmentation and surpasses the classical counterparts. Konar et al. recently developed quantum-inspired neural network models referred to as QIS-Net [4] and QIBDS-Net [5] suitable for brain MR image segmentation. These networks have been found to attain promising outcome in complete brain tumor segmentation. The optimized version of the network (Opti-QIBDS Net) [6] is also proposed for optimal segmentation of brain tumors and serves the motivation behind assimilation of quantum-inspired computing in the current 3D-QNet architecture.

A. Motivation
Despite the remarkable success achieved in volumetric brain MR image segmentation, 3D CNN architectures still face some inherent challenges owing to deeper and complex network architectures [11], [13]- [15]: 1) Owing to complex anatomical properties of 3D MR brain images, majority of 3D CNN architectures require a large number of parameters to capture 3D contextual representative feature information. 2) High computational (GPU) and memory resources required for large scale 3D CNNs pose a potential concern in widespread clinical applications. 3) Automated volumetric brain MR image segmentation using 3D CNN architectures often confronts overfitting, slow-convergence and vanishing gradient problems. Moreover, it is a paramount task to manipulate the hyper-parameters of the underlying 3D architecture. 4) In addition, the availability of 3D annotated data for training a 3D CNN is not sufficient and very expensive resulting in lack of image-specific adaptability.

III. FUNDAMENTALS OF QUANTUM COMPUTING
The basic concept of quantum computing deals with the principles of quantum mechanics and offers to demonstrate the quantum computing algorithms which rely on quantum bits having quantum operations on qubits [33].

A. Quantum Bits and Tensor Products
The basic element equivalent to classical bits in quantum computing is known as quantum bit or qubit and is represented using Dirac notations |0 and |1 . However, unlike classical computing, quantum bits are expressed as a linear combination of probability amplitudes often known as superposition as follows [28].
where, 0 ≤ α ≤ π and 0 ≤ θ ≤ 2π. Hence, qubits reside in the Hilbert space parametrized by the continuous variables θ and α. In quantum formalism, the tensor products of the subspace form the full Hilbert Hyperspace, H as A set of n basis states (designated as |φ j ) comprising 0 − 1 can form a qubit system |ψ , of size log n in the Hilbert space, H as follows.

B. Input Data Encoding and Tensor Decomposition
A tensor product basis relies on the local input feature map {Φ dj (α j )} in the Hilbert space of functions over α j ∈ [0, 1] as [27].
where, d j varies from 1 . . . N (N-dimensional vector). A function, f l (α) can be realized using the tensor product of input local feature map φ dj (α j ) and the network weight decomposition Ψ, as follows.
Hence, the local feature map φ dj (α j ) forms a basis for a Hilbert space of functions defined over α ∈ [0, 1] and the tensor product basis Φ d1,d2,...d N (α) forms a Hilbert space of functions defined over α ∈ [0, 1] N . Considering the dimensions of the input feature vector restricted to N = 2, φ(α) is defined as In order to enhance and extract the contextual information from high dimensional data, Trcuker tensor decomposition is suitable for neural network layer decomposition [34].

IV. 3D QUANTUM-INSPIRED SELF-SUPERVISED TENSOR NEURAL NETWORK (3D-QNET) ARCHITECTURE
In this article, a 3D version of the QIS-Net [4] referred to as 3D Quantum-inspired Self-supervised Tensor Neural Network (3D-QNet) with self-supervised tensor learning is proposed for automatic voxel-wise segmentation of MR images. The 3D-QNet comprises trinity of volumetric layers of quantum neurons arranged as input, intermediate and output layers. A schematic outline of the proposed 3D-QNet architecture is shown in Figure 1. The input volume (M × N × P ) is normalized and propagated from the 3D input layer to the successive 3D hidden and output layers of the 3D-QNet architecture for processing through 3 × 3 × 3 voxels. Each of the three volumetric layers of 3D-QNet architecture is fully intra-linked with qubits using a 3D-matrix representation. Each 3D layer of the proposed architecture is intra-connected through quantum neurons with intra-connection strengths set to π 2 (Quantum 1 logic). The basic processing unit of each volumetric layer of the 3D-QNet architecture is a 26-connected neighborhood-based voxel-wise orientation of each candidate neuron as illustrated in Figure 1.
The relevant details about the principle of operation of the proposed 3D-QNet are provided in the following subsections using a self-supervised Tensor learning model in quantum formalism.

A. Quantum-Inspired Self-supervised Tensor Network Model
In the suggested 3D-QNet architecture, the high dimensional weight vector Ψ is represented using tensor to optimize the network operations and to facilitate the extraction of significant semantic feature information in the quantum-inspired supervised model. The internal kernels associated with the network operate in parallel, thereby accelerating convergence of the 3D-QNet. The input quantum neurons containing the pixel intensity are expressed as qubits and the inter-connection weights are modified using quantum rotation gates. The classical intensity of any i th normalized gray-scale image pixel of MR volume (denoted as α i ∈ [0, 1]) is transformed into quantum state using a mapping function φ(α i ) as follows.
The angle of rotation is measured using the relative difference of fuzzy intensity of the candidate pixel and the neighborhood pixels in quantum formalism. This relative measure helps to segment the foreground and background regions of an image.
Hence, ω i,j is designated as the angle of rotation and measured as relative intensity difference between the candidate pixel (α i ) and one of its neighborhood pixels α i,j . The strength of inter-connection between neuron j (neighborhood of the candidate neuron i) of a layer to the corresponding candidate neuron of the adjacent layer is mapped using ϕ. The classical interconnection weight [0, 1] is transformed into quantum formalism as In this proposed tensor network model, the 3D-QNet layer is decomposed as voxel (core tensor) using Tucker Tensor decomposition [34] to reduce the input dimensions and the interconnection wight as factor matrices. Let us consider tensor V, Ψ ∈ R m×n×p , where V is the voxel-wise input of 3D MR images and the corresponding inter-connection 3D weight matrix as evaluated in Eq. 9, respectively (m, n, p denote the row, column and slice number and V, Ψ are third order tensors ( ). According to Tucker Tensor decomposition [34], where, X ∈ R m×n×p the tensor outcome, Ψ n is the weight matrix in terms of n factor matrix and × n is the mod − n product of tensor with a matrix. Each layer of the proposed 3D-QNet architecture is transformed to lower dimensional tensors. Such types of M ×N ×P tensors in terms of voxel are formed for each layer in the underlying network architecture. Each volumetric layer of the 3D-QNet architecture forms M × N × P volumetric patches (voxels) of size 3 × 3 × 3 corresponding to the candidate pixels as Here, V comprises all 3D-patches (voxels), v ∈ R m×n×p of a network layer in the proposed 3D-QNet architecture. The spatial features in terms of neighborhood pixels of every seed pixel at the network layer are extracted and propagated to next subsequent layers as input guided by a Quantum-inspired voxel-wise multi-level Sigmoidal (Vox-QSig) activation function, σ 3D−QN et as follows.
where, v l−1 ∈ R M ×N ×P , ϕ l (ω) ∈ R M ×N ×P ×K at the network layer l = 2, 3, y l ∈ R K and * is the inner product operator. The fuzzy context-sensitive activation (designated as χ i ) for semantic segmentation in quantum formalism is defined as follows.
where, the angle of rotation, ϑ i is evaluated using the summation of the intensities of third order 26-connected neighborhood pixels (denoted as α i,j , j = 1, 2, . . . K) of a candidate pixel i (neuron) in quantum formalism using the following equation.
Quantum fuzzy context-sensitive thresholding determines the bi-directional propagation of quantum information between the layers of the 3D-QNet architecture by means of selforganization of the inter-linked weight matrices. Reduction of feature dimensions using tensor decomposition followed by voxel-wise information processing of the proposed 3D-QNet architecture is inspired by the basic quantum neural network input-output model [6] as follows: where, |φ l (α d i ) denotes the intermediate output of the i th seed quantum neuron at the 3D network layer in the l th sample with depth (slice#) d = 1, 2, . . . P . σ 3D−QN et is the Quantuminspired voxel-wise multi-level Sigmoidal activation (QSig) function with activation as |χ l,d i described in the following subsection IV-B. The output |φ l (α d i ) can be written as Here, the designated rotation angles associated with the interconnection weights between input neuron j to output neuron i are represented by ω l,d j,i and δ l,d i is the phase transfer parameter. The true classical output state (|1 ) from the i th quantum neuron is obtained considering the imaginary section (sin) of the above expression where, γ is an imaginary unit. Assume that the inter-connection weights between the input and hidden layer of the 3D-QNet architecture are denoted by |Ψ l,d k,j and for the hidden layer to output layer are denoted by |Ψ l,d j,i in the l th sample sets. The activation at the hidden and output layers are designated using |χ l,d j and |χ l,d i , respectively. Considering any quantum seed neuron k from the sample of input neurons at the input layer, the corresponding seed neuron at the hidden layer be j and the output seed neuron be i, the response at the i th neuron with depth d in the l th sample sets is expressed as i.e., B. Quantum-inspired Voxel-wise multi-level Sigmoidal (Vox-QSig) activation function In this 3D-QNet architecture, a modified version of the QMSig [4] activation function is suggested and referred to as Quantum-inspired voxel-wise multi-level Sigmoidal (Vox-QSig) activation function for voxel-wise processing of 26connected spatially oriented neighborhood-based pixels. The Vox-QSig activation function, σ 3D−QN et with slope λ and activation υ, is defined as where, β τ describes the multi-level class responses exhibited by the 26-connected third order neighborhood pixels expressed as where, ρ τ and ρ τ are the τ th and (τ − 1) th class outcomes, respectively and the contribution of the 26-connected neighborhood gray-level pixels is χ N . The generalized form of V ox − QSig is obtained by leveraging the activation function hyper-parameters employed in Equation 19 as where, L corresponds to the number of class levels. The multi-class responses for various hyper-parameters employed in V ox − QSig activation functions are provided in Figure  2. Brain MR volumes exhibit heterogeneous responses over the local intensities in the 26-connected neighborhood regions, owing to the wide variations of gray-levels. Inspired by the authors' previous works [4], [22], [35], [36], the proposed Vox-QSig activation function employs four different adaptive thresholding schemes suitable for efficient gray-scale segmentation in the 3D-QNet architecture.
where C l represents the number of defined classes in C ={C 1 , C 2 , . . . , C C l } and where, the i th pixel is defined as p i . The probability of class C i is represented as µ i and its mean value is given by µ i . ω is known to be the mean of class C.

C. Adjustment of Inter-connection Weights of 3D-QNet and Loss Function
Each inter-connection link of the 3D weight matrix of size 3 × 3 × 3 for each candidate pixel of brain MR volume and its corresponding activation is updated using quantum rotation gates thereby enabling faster convergence of the proposed 3D-QNet architecture. The inter-connection weight, ϕ l,d and its activation, χ l,d are updated as follows. where, and Equations 26 and 27 refer to updating the angles of rotation and activation, respectively. The error or loss function, ζ(ω l,d , ϑ l,d ) in the suggested 3D-QNet is evaluated in terms of Root Mean Square Error (RMSE) of the 3D-weight matrices at depth d (or the slice #d) in the l th epoch and is defined on the phase angles ω l,d , ϑ l,d as The convergence analysis of the proposed 3D-QNet is illustrated in Appendix Section A.

A. Data Set
The proposed 3D-QNet is validated extensively using the BRATS 2019 data-sets [38]. The BRATS 2019 dataset is composed of 315 (239 HGG and 76 LGG) 3D MRI volumes. Each MRI volume comprises 155 slices of resolution 240×240 with the ground truth segmented labels and includes four different modalities of 3D MR images viz. T 1, T 1 with Contrast-Enhanced (T 1 − CE), T 2 and F LAIR. The segmented labels are annotated with three distinct tumour subregions, viz. tumor core (TC), tumor enhancing (TE), necrosis and non-enhancing core region. These three annotations form a complete tumor (WT). The BRATS 2019 data set is divided into 8 : 2 ratio for training (252) and testing (63) due to GPU limitations.

B. Experimental Setup
Intensive experiments have been carried out using 3D-QNet on 3D brain MR volumes collected from the BRATS 2019 data set of size 240 × 240 using a Nvidia RTX2070 GPU with MATLAB 2020a and Python3.6 (Pytorch). The proposed 3D-QNet is implemented with the multi-level gray-scale images using distinct multi-class levels L = 4, 6, and 8 characterized by the Vox-QSig activation function. The steepness λ is varied in the range 0.23 to 0.24 with step size 0.001. It has been observed that in majority of cases, λ = 0.232 yields optimal performance. Moreover, the Vox-QSig is guided by four distinct activation schemes (υ β , υ ξ , υ ζ , υ κ ) [4], [22], [35]. Highly representative volumetric segmentation is followed by k-means algorithm [39] for false-positive reduction in brain tumor detection and to refine the segmentation accuracy and dice score. The lesion or complete brain tumor detection mask is binarized using a threshold of 0.5. Experiments have also been performed using the 3D-UNet [13] architecture, Deep Voxel-wise Residual Network (VoxResNet) [15], Dense-Res-Inception Net (DRINet) [16], and 3D-ESPNet [17] on the BRATS 2019 data set [38]. We have trained 3D-UNet [13] and VoxResNet [15] rigorously using the Stochastic Gradient Descent (SGD) algorithm on Caffe library 1 . The 3D-ESPNet is implemented using Pytorch from the code available in Github 2 with 100 epochs using adam optimizer with a learning rate of 0.0001. The DRINet is implemented using adam optimizer with learning rate of 0.001 and kernel size of 3 × 3. The segmented output images resemble in size with the dimensions of the binary mask and the outcome 1 is considered as tumor region and 0 as background in detecting complete tumor. The pixel by pixel comparison with the manually segmented regions of interest or lesion mask allows evaluating the dice similarity (DS) [9], which is considered as a standard evaluation procedure in automatic medical image segmentation. The evaluation process involves the manually segmented lesion mask as ground truth, and each 2D pixel is predicted as either True Positive (T RP ) or True Negative (T RN ) or False Positive (T RN ) or False Negative (F LN ). The empirical goodness measures [Positive Predictive Value (P V ), Sensitivity (SS), Accuracy (AC) and Dice Similarity(DS) [9]] are assessed to evaluate the results.

C. Experimental Results
Extensive experiments have been performed in the current setup, and experimental outcomes are reported with the demonstration of numerical and statistical analyses using the proposed 3D-QNet, 3D-UNet [13], VoxResNet [15], DRINet [16], and 3D-ESPNet [17]. It is evident from the experimental data reported in Table I that the proposed 3D-QNet performs optimally for complte brain tumor segmentation of four different modalities of MR volumes (viz. T 1, T 1 − CE, F LAIR, and T 2) using the activation guided by 26-connected heterogeneous voxel intensities (υ ξ ) with L = 8 in comparison with other thresholding schemes under the four evaluation parameters (AC, DS, P V, SS) [9]. The 3D-QNet segmented brain MR slices collected from two different volumes BRATS19-CBICA-AAG and BRATS19-CBICA-AAB using class level L = 8 with activation schemes, υ ξ are shown in Figure 3 and 4, respectively. The human expert annotated ground truth slices for all the four different modalities are illustrated in Figure 5. It has been observed from the segmented MR slices that our 3D-QNet is suitable in segmenting the correct position and size of the complete tumor while compared with the ground truth segmentation. However, it is not efficient in mapping the sharp contour of the core and enhanced tumor sub-regions outlined in the annotated slices. Table II presents the quantitative results reported using the proposed 3D-QNet, 3D-UNet [13], VoxResNet [15], DRINet [16], and 3D-ESPNet [17] on evaluating the average accuracy (AC), dice similarity score (DS), positive prediction value (P V ), and sensitivity (SS) [9]. It has been observed from the 3D-QNet segmented brain MR slices and the results reported in Table II, that optimal segmentation is achieved for F LAIR and reported an average of 0.821 dice score (DS) for F LAIR. The proposed 3D-QNet marginally outperforms the convolutional based architectures (3D-UNet [13], VoxResNet [15], DRINet [16], and 3D-ESPNet [17]) in predicting complete brain tumor detection. However, it may be noted that our 3D-QNet does not intend to predict the core, enhanced tumor and necrosis sub regions owing to lack of optimization of the parameters in the suggested 3D-QNet. The box plots are also demonstrated in Figure 6 citing the outcome reported in Tables II. Moreover, to show the effectiveness of our proposed 3D-QNet over 3D-UNet [13], VoxResNet [15], DRINet [16], and 3D-ESPNet [17], we have conducted one-sided twosample Kolmogorov-Smirnov (KS) [40] test with significance level α = 0.05. It is interesting to note that despite being characterized by a fully self-supervised quantum learning, the 3D-QNet has shown similar accuracy (AC) and dice similarity (DS) in comparison to 3D-UNet [13], VoxResNet [15], DRINet [16], and 3D-ESPNet [17]. Hence, the performance of the 3D-QNet model on the BRATS 2019 data set is statistically significant and offers a promising solution to self-supervised deep learning for 3D-medical image segmentation.

VI. CONCLUSION
A 3D Quantum-inspired Self-supervised Tensor Neural Network (3D-QNet) architecture characterized by 26-connected voxel-wise processing for fully automated semantic segmentation of Brain MR volume is presented in this work. Intensive validation using the BRATS 2019 data set shows the efficacy of the proposed self-supervised 3D-QNet to promote automatic semantic segmentation of Brain MR volumes in real-time with minimum human intervention which is still considered as an uphill task in the field of medical image segmentation. The incorporation of quantum-inspired computing and tensor-based learning in the suggested network model aims to provide faster convergence of the 3D-QNet, thereby enabling accurate segmentation results. Despite being a 3D self-supervised network model, 3D-QNet achieved similar dice similarity score on complete tumor detection as deeply supervised 3D-UNet, Vox-ResNet, DRINet and ESPNet, thus promoting self-supervised network learning for volumetric segmentation of medical images. In principle, the proposed 3D-QNet is a general selfsupervised network architecture and to be extended in many other 3D medical image segmentation avenues, where the segmented annotations are limited. However, the 3D-QNet fails to yield optimal outcome for multi-level segmentation on the BRATS 2019 data sets. Authors are currently engaged in extending the 3D-QNet by up-scaling the intermediate     [38] volumetric features in the network and optimizing its hyperparameters to yield optimal segmentation outcome.