PET Image Reconstruction With Kernel and Kernel Space Composite Regularizer

Led by the kernelized expectation maximization (KEM) method, the kernelized maximum-likelihood (ML) expectation maximization (EM) methods have recently gained prominence in PET image reconstruction, outperforming many previous state-of-the-art methods. But they are not immune to the problems of non-kernelized MLEM methods in potentially large reconstruction variance and high sensitivity to iteration numbers, and the difficulty in preserving image details and suppressing image variance simultaneously. To solve these problems, this paper derives, using the ideas of data manifold and graph regularization, a novel regularized KEM (RKEM) method with a kernel space composite regularizer for PET image reconstruction. The composite regularizer consists of a convex kernel space graph regularizer that smooths the kernel coefficients, a concave kernel space energy regularizer that enhances the coefficients’ energy, and a composition constant that is analytically set to guarantee the convexity of composite regularizer. The composite regularizer renders easy use of PET-only image priors to overcome KEM’s difficulty caused by the mismatch of MR prior and underlying PET images. Using this kernel space composite regularizer and the technique of optimization transfer, a globally convergent iterative algorithm is derived for RKEM reconstruction. Tests and comparisons on the simulated and in vivo data are presented to validate and evaluate the proposed algorithm, and demonstrate its better performance and advantages over KEM and other conventional methods.

reconstruction. It uses the prior data, such as MR anatomical images, to construct the kernel matrix and uses the maximumlikelihood (ML) expectation maximization (EM) algorithm in the kernel space to iteratively estimate the kernel coefficients of the image, which in turn are mapped by the kernel matrix to the reconstructed image.
KEM has proven to perform better than conventional non-kernelized MLEM methods in enhancing the spatial and temporal resolution and quantification in static and dynamic PET imaging, with great potential in low dose PET imaging. Following these promising results, a number of kernelized methods have recently been presented for various PET image reconstruction problems, see eg [3], [4], [5], [6], [7], [8], and [9]. It is shown in [5] that constructing the kernel matrix with the combination of anatomical MR and pre-reconstructed PET prior images improves the performance of kernelized methods.
Despite the advantages of kernelized MLEM methods, they may suffer the same problems of conventional MLEM methods. As discussed in [10], a well known problem of all ML methods in PET is ill-conditioning, which makes the solutions sensitive to small changes in the data so that the estimates are of high variance. This problem is shown in practice as the 'checkerboard' effect of high spatial variance in the ML images for high iteration numbers of the EM algorithms. Because of the smoothing effect of kernel matrix, KEM's image variance can be much lower than that of MLEM. But it still has the problem of increasingly high image variance at high iteration numbers, especially in low count imaging. This is observed in our experiments presented in Fig. 5 of Subsection III-E. A similar observation is also made in [11], with a suggested fix to the problem by early stopping of iterations in the kernelized EM method, which is a pragmatic yet low performance fix to the same problem in conventional MLEM methods [10], [15].
For conventional MLEM methods, an effective high performance solution to the above problems is the regularized MLEM estimation [10], [12], [13], [14], [15], [37]. It constrains the MLEM estimates with the regularizers introduced as the priors of PET images. The properly chosen regularizers can significantly reduce the image variance at high iteration numbers and preserve the image details simultaneously [15], [16], [17], [18]. However, this effective solution has not been adopted in the kernelized methods yet, probably due to the following reasons.
First, the algorithms are simpler without regularizer. Second, the kernels in kernelized methods can be tuned to significantly smooth out the image variance or to better preserve image details. Therefore, there is a belief in the current literature that the kernel matrix can be used to regularize MLEM in kernelized reconstruction of PET images [2], [5], [11]. This might have steered research attentions towards finding better kernels and their better parameters for better tradeoffs between suppressing image variance and preserving image details.
From the kernel and regularization theory [24], an optimization regularizer is generally an explicit function of the optimizers (optimization variables) and varies with the optimizers, eg grows as the optimizers grow, so as to constrain the optimizers during the optimization process, and a kernel may function as a regularizer if it has such property. This is the case in the conventional regularized MLEM methods, eg [10], [12], [13], [14], [15], and [37] where the regularizers are the functions of the estimated variables and vary with the estimated variables during iterations. But this is not the case in the current kernelized methods, where the only possible regularizer, the kernel matrix, is a constant matrix constructed from prior data which is not a function of the estimated variables and does not vary with the estimated variables during iterations. Therefore, they generally have limited regularization capability even though their kernels are correlated with the estimated variables to some extent.
Apart from the potential sensitivity to iteration numbers, the lack of explicit regularization makes it complicated for the current kernelized methods to suppress image variance and preserve image details simultaneously. This is evidenced by the observations of [1] and our experiments presented in Section III that it is generally difficult to do both simultaneously by tuning the kernel parameters of KEM, especially in the presence of mismatch between MR prior and underlying PET images.
Above facts strongly suggest that some explicit and effective regularizations are needed in the kernelized methods in order to reduce their reconstruction variance and sensitivity to iteration numbers and to preserve image details simultaneously. This has motivated us to introduce regularization in the kernelized reconstruction methods. Since the kernelized methods use kernel coefficients as the optimizers to perform MLEM, by the regularization theory [24], their regularizers should be the explicit functions of the kernel coefficients and computed directly in kernel space during iterative optimization. We therefore develop in this paper a novel kernel space regularization method, called RKEM, for the KEM method to avoid its possible large reconstruction variance and high sensitivity to iteration numbers, and to preserve image details simultaneously.
For conventional regularized MLEM methods, the globally convergent iterative computation algorithms are preferred and have been developed using the optimization transfer technique [19], [20], [21], [22], [23]. To derive such algorithm for RKEM, we borrow the ideas of manifold assumption and graph regularization from manifold learning [25] and sparse coding [26], [27] to develop a simple yet effective kernel space composite regularizer.
The composite regularizer consists of a convex kernel space graph regularizer that smooths the kernel coefficients, a concave kernel space energy regularizer that enhances the coefficients' energy, and a composition constant that is analytically set to guarantee the convexity of composite regularizer. To overcome KEM's difficulty caused by the mismatch of MR prior and underlying PET images [1], we use prereconstructed PET image as prior to construct the graph in the composite regularizer. Combining this composite regularizer with the optimization transfer technique [19], [22], [23], we present a globally convergent RKEM algorithm for PET image reconstruction. The novel ingredients of the kernel space composite regularizer set RKEM apart from the exiting kernelized methods and empower it with better performance. We further use simulated and in vivo data to validate and evaluate the performance of RKEM algorithm, and demonstrate its better performance and advantages over the existing methods by comprehensive comparisons.
The rest of the paper is organized as follows. Section II presents our proposed method in detail. Section III presents the simulation studies and comparisons of the proposed method with the existing methods. Section IV presents the in vivo data testing results of the proposed method and compares with those of the existing methods. Discussions and Conclusions are presented in Sections V and VI, respectively.

A. PET Image Reconstruction With Regularized ML
Let y = [y 1 · · · y i · · · y N ] T be the PET projection data vector with the elements y i having the log-likelihood function [15] where N is the total number of lines of response and y i is the expectation of y i . The expectation of y,ȳ = [ȳ 1 · · ·ȳ i · · ·ȳ N ] T , is related to the unknown emission image x = [x 1 · · · x i · · · x M ] T by the forward projection where x i is the ith voxel intensity of the image x, M is the total number of pixels, H ∈ R N ×M is the PET system matrix, and r = [r 1 · · · r i · · · r N ] T is the expectation of random and scattered events. The image x ∈ R M here is the vectorization of a multi-dimensional PET image. An estimate of the unknown image x can be obtained by the regularized maximum likelihood (RML), or equivalently maximum a posteriori (MAP), estimate [10], [15], [17], [23] x = arg max where L( y|x) is as defined in (1), β ≥ 0 is a regularization parameter, and V (x) is a regularizer or prior function. When β = 0, (3) reduces to the ML estimation, which, as is well known, generally produces noisy estimate with high variance and requires early termination when it is computed by EM iterations. If chosen properly, the regularizer V (x) can effectively reduce MLEM's estimation variance and sensitivity to the iteration number, and hence improves the image quality of MLEM reconstruction [10], [15]. In this paper, we consider V (x) as a composite of two regularizers where γ ≥ 0 is a composition constant and V (x) as a whole is a convex function.

B. Kernelized PET Image Reconstruction With Regularized ML
Following [2], the unknown emission image x can be presented as x = K a in the kernel space, where K ∈ R M×M is a kernel matrix and a ∈ R M is the coefficient vector of x under K .
The i jth element of the kernel matrix K is given by is a kernel (function),x i andx j are respectively the data vectors for the ith and jth elements of x in the prior dataset X of x obtained from other sources, eg MR anatomical or pre-reconstructed PET images of the same (class of) subject/s, and φ(·) is a nonlinear function mapping x to a higher dimensional feature space but needs not to be known explicitly.
The commonly used kernel in PET image reconstruction is the radial Gaussian kernel [2], [11] κ( where σ is a kernel parameter, || · || is the 2-norm of a vector. For computation simplicity and performance improvement, the kernel matrix K in practice is often constructed using where J is the neighborhood of the ith element of emission image x, with the neighborhood size J defined as the J nearest pixels ofx i . Other methods such asx j ∈ k N N (the k nearest neighbor) ofx i , measured by ||x i −x j ||, can also be used [2]. Using the kernel space image model x = K a, we can convert the forward projection (2) and the log-posterior probability maximization (3) to their respective kernelized equivalents and reconstruct the unknown image x bŷ As seen from above, the kernel matrix K serves as a basis of the underlying image x obtained from the high dimensional feature space of the nonlinearly transformed prior data φ(x i ), and a is the coefficient of the image x under K . The kernel matrix K limits the search of x to the linear combinations of its basis vectors. This may effectively improve image reconstruction if K is chosen properly [2].
Exploiting the linear relation x = K a, KEM uses standard EM algorithm to iteratively solve (8) with β = 0 forâ. Hence, KEM shares the same convergence property of conventional MLEM and also requires early stopping of EM iteration to avoid noisy reconstruction [11]. Similar to V (x) in the non-kernelized RML (3), the first aim of the kernelized regularizer V (K a) in RKEM (8) is to reduce KEM's estimation variance and sensitivity to the iteration number and hence enhance its image quality.

C. Kernel Space Composite Regularizer
As shown in [1], KEM with its kernel matrix K constructed using larger neighborhood size J and kernel parameter σ may perform poorly on the PET-only lesion existing only in PET image. Reducing J and σ may enhance such lesion, but it may also deteriorate the PET+MR lesion coexisting in both the PET and the MR prior images. Tuning J and σ to achieve simultaneously the best preservation of these two types of lesions and the best image variance suppression is generally difficult for KEM. The kernel space composite regularizer proposed below for RKEM (8) will overcome this difficulty while achieving the first aim of kernelized regularizer V (K a).
For the non-kernelized RML (3), its regularizer V (x) must be convex in order to have global maximum [10], [28], and iterative optimization is generally needed for its solution.
To ensure the convergence of iterative optimization, the optimization transfer method [19] is often used to derive the convergent iterative algorithm, see eg [22] and [23]. Since the optimizations (3) and (8) are equivalent due to x = K a, the RKEM (8) with x replaced by K a also needs a convex composite regularizer and globally convergent iterative algorithm for its optimization. Construction of regularizer is crucial to derive such algorithm.
A commonly used regularizer in the literature for the non-kernelized RML (3) is the quadratic image smoothing function [10], [15], [23] (corresponding to γ = 0 and where ψ i j = ψ ji ≥ 0 are weights, L = D − is a Laplacian matrix with the degree matrix D = diag[( j∈J ψ 1 j ) · · · ( j∈J ψ M j )] and the weight matrix consisting of ψ i j for i ∈ {1, 2, · · · , M}, j ∈ J , and 0 otherwise. It has been used in eg [10], [23] to derive convergent iterative algorithms for the non-kernelized RML (3). A key technique in deriving these algorithms is the so called surrogate functions. They are used to substitute the likelihood and regularizer functions in (3) and make the resulting surrogate objective function separable for each x i , i ∈ {1, 2, · · · , M}, [10], [19].
However, it is complex to apply this technique directly in the RKEM (8) where x is replaced by K a and a is used as the optimizer. In this case Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
are the ith and jth rows of K . Since K is generally a band matrix with multiple nonzero elements in each row, there are multiple k il a l ̸ = 0 in M l=1 k il a l and k jl a l ̸ = 0 in M l=1 k jl a l for l ∈ {1, 2, · · · , M}. It is therefore difficult to decouple each optimization variable a i in the surrogate objective function of (8) by simply replacing x i and x j with K i a and K j a in the existing surrogate functions of v(x) (see eg [10], [23]). To overcome this difficulty, and also to use the prior information from different sources for improved image quality, we propose below a novel kernel space graph regularizer.
First, we construct another kernel matrix K R ∈ R M×M using the prior image datax R i andx R j , and the formulas (5)-(6) with the neighborhood size J R and the kernel parameter σ R . To use the prior information from different sources, the prior image datax R i andx R j here can be different from the prior image datax i andx j used to construct the kernel matrix K . For example,x R i andx R j can be from pre-reconstructed PET images, andx i andx j from MR anatomical images.
Next, we construct a doubly normalized Laplacian matrix where I is identity matrix and C is a diagonal scaling matrix obtained by applying Sinkhorn matrix balancing procedure [29] to the kernel matrix K R . This Laplacian matrix L defines a graph of the image x, with the adjacency matrix W . Different to the original kernel matrix K R , the adjacency matrix W obtained from Sinkhorn matrix balancing of K R is a symmetric non-negative doubly stochastic matrix, with each row and each column summing to 1, representing the transition probabilities of pixels x i 's along the graph [30], and the diagonals of W , w ii 's, represent the self-loop transition probabilities of pixels x i 's. This distinct property is crucial to the derivation of the regularizers presented below. With L obtained from (12), we define a Laplacian quadratic of the kernel coefficient a and use it as a regularizer for the RKEM (8), where w i j is the i jth element of W given in (12). This regularizer is the same form smoothing function as that of (10), but it is in the kernel coefficients a i , instead of the image pixels x i = K i a, and has the weights w i j ≥ 0 specified by (12). As shown in Subsection II-D, replacing x i with a i in the surrogate functions previously used for the v(x) in (10), see eg [10], [23], we can easily obtain a surrogate function for the regularizer v 1 (a) and use it to separate each coefficient a i , i ∈ {1, 2, · · · , M}, in the surrogate objective function of the RKEM (8). The use of regularizer v 1 (a) = a T La stems from the fact that high dimensional data in real world, such as that of an image, generally lie on low-dimensional manifolds, and a manifold can be represented by a graph defined by the Laplacian matrix constructed from the data [25]. If two data points are close in the intrinsic data manifold, then their representations in any other basis are also close. This is the so called manifold assumption that has been used in manifold learning and dimensionality reduction [25], clustering [31], [32], semisupervised learning [33], [34] and sparse coding [26], [27] to develop various effective methods. Similar idea has also been used in [35] for dynamic MR image reconstruction with smoothness regularization on manifolds.
It has been shown in [26] and [27] that a signal x and its coefficient (code) a under a basis (dictionary) share the same smoothness property in the manifold of x represented by a graph. Using the Laplacian matrix L of this graph constructed from the data of x, the Laplacian quadratic a T La can be used as a regularizer to control the smoothness of x, instead of using x T L x. This method is borrowed here for the RKEM reconstruction of PET image. Because v 1 (a) = a T La is derived from a graph and is in the kernel coefficients a, we call it the kernel space graph regularizer.
Similar to its image space counterpart x T L x, the regularizer a T La may over-smooth the kernel coefficient a, which in turn may over-smooth the edges and small objects in the reconstructed image. To mitigate this problem, we introduce another kernel space regularizer where w ii , i = 1, 2, · · · , M, are the diagonal elements of the adjacency matrix W constructed in (12) and W D is the diagonal matrix consisting of w ii 's.
Since v 2 (a) is a negative valued concave function, it promotes the total energy of a when it is used as a regularizer in the RKEM (8). The promotion of each a i , the ith element of a, is determined by its weight w ii . By the construction of W , w ii ≥ 0 is the self-loop transition probability of the pixel x i on the graph. For the x i 's at the non-smooth areas of edges and small objects in the image, their intensities are higher than other pixels' and hence their corresponding self-loop transition probability w ii 's are also higher for them to remain at their locations during the transitions. This is demonstrated in Fig. 1, where (a) shows a simulated PET image with M pixels and two lesions of different sizes and intensities, which is used to construct the adjacency matrix W of (12), and (b) shows the image consisting of the diagonal elements, w ii , i = 1, 2, · · · , M, of the W . As seen from these images, the locations and intensities of larger (brighter) w ii 's coincide with those of the lesions in the simulated PET image. Since a i 's and x i 's share the same smoothness (or non-smoothness) property in the manifold of x, we use w ii as the weight of a 2 i in the regularizer v 2 (a). The larger the weight w ii , the more enhancement of a 2 i and the more its contribution to the total energy of a. This in turn results in a brighter x i with higher energy at the corresponding location of the reconstructed image. Hence, we call v 2 (a) the kernel space energy regularizer in the sequel.
With the regularizers v 1 (a) and v 2 (a) introduced in (12) and (14), we define a kernel space composite regularizer V (a) and use it as V (K a) in the RKEM objective function of (8).
where γ > 0 is a composition constant and (a i ) = 1 2 convex for all i, and the necessary and sufficient condition to achieve this is [36] As W is doubly stochastic with j∈J w i j = 1 [30], (16) amounts to γ ≤ 1 2w ii , ∀i ∈ {1, 2, · · · , M}. To satisfy this condition, it suffices to set the composition constant where w max = max{w ii , i = 1, 2, · · · , M}. Setting γ as above, the kernel space composite regularizer V (a) can be written as V (a) = a T Qa with Q = I − W γ ≥ 0 and W γ = W + γ W D . With its rows not summing to zero, Q is not a Laplacian matrix. Hence, V (a) is no longer a Laplacian quadratic smoothing regularizer in the form of (10) or (13).

D. Kernelized Surrogate Objective Function and Convergent Iterative Optimization
With V (a) defined above, we now derive a convergent iterative algorithm for the RKEM (8) using the optimization transfer method [10], [21]. To do this, we denote Q(a; a n ) the kernelized surrogate objective function satisfying Q(a; a n ) − Q(a n ; a n ) ≤ (a) − (a n ), ∇ Q(a n ; a n ) = ∇ (a n ), where ∇ is the gradient operator, (a) = L( y|K a)−βV (a) is the original objective function of (8) with the regularizer V (a) defined in (15), and a n is the estimated kernel coefficient at the nth iteration. The purpose of Q(a; a n ) is to substitute (a) and convert (8) toâ = arg max a≥0 Q(a; a n ). It then follows from the conditions (18)- (19) that solving this equivalent problem iteratively for the globally optimal solution is equivalent to solving (8) for its globally optimal solution. In accordance with (a), we define Q(a; a n ) = Q L (a; a n ) − β Q V (a; a n ), where Q L (a; a n ) and Q V (a; a n ) are respectively the surrogate functions for L( y|K a) and V (a).
For the kernelized likelihood function L( y|K a), we use the surrogate function Q L (x; x n ) of [21], [23], with x substituted by K a and K absorbed into Q L (·), to obtain the surrogate function where and [Ga n ] k is the kth element of Ga n . It is easy to prove that Q L (a; a n ) − Q L (a n ; a n ) ≤ L( y; a) − L( y; a n ) and ∇ Q L (a n ; a n ) = ∇ L( y; a n ). For the kernel space composite regularizer V (a) defined in (15), we only need to decouple v 1 (a) as a i 's in v 2 (a) are already decoupled. Using De Pierro's decoupling rule [19], [23] (a i − a j ) 2 ≤ 2(a i − a n i + a n j 2 ) 2 + 2(a j − a n i + a n j 2 ) 2 we define the surrogate function Q V (a; a n ) for V (a) Q V (a; a n ) = M i j∈J w i j (a i − a n i + a n j 2 ) 2 + (a j − a n i + a n which satisfies Q V (a; a n ) − Q V (a n ; a n ) ≥ V (a) − V (a n ) and ∇ Q V (a n ; a n ) = ∇V (a n ). With Q L (a; a n ) and Q V (a; a n ) introduced in (21)-(23), it is easy to show that the surrogate objective function Q(a; a n ) defined in (20) satisfies the conditions (18)- (19). Hence, we can perform the RKEM reconstruction (8)-(9) using a n+1 = arg max a≥0 (Q L (a; a n ) − β Q V (a; a n )), x = K a n+1 .
As seen from (21)-(23), the optimization (24) is equivalent to a sequence of M optimizations Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
(a j − a n i + a n which can be solved by iterative calculation of a n+1 i for i = 1, 2, · · · , M, sequentially.
The estimation of a i at the iteration n + 1 can be calculated by where i,E M , B i = (F i − β j∈J w i j (a n i + a n j )), , all the variables in A i , B i and C i are as given in (21)-(23), and γ = 1 2w max is as given in (17).
The a n+1 i 's thus estimated give rise to the coefficient vector a n+1 that is used in (25) to obtain the estimated imagex.
Similar to the optimization transfer algorithm of [23] for the non-kernelized image space RML optimization, the iterative algorithm above guarantees a monotonic convergence to the global solution of RKEM (8). When β = 0, it reduces to the KEM algorithm [2] without regularization.

E. Proposed Method
Algorithm 1 summarizes the proposed RKEM method in pseudo-code. The prior image datax R in the algorithm is PET image pre-reconstructed by MLEM with Gaussian filtering for improved performance on PET-only lesion. It can be replaced by other priors such as the MR imagex if needed for other purpose.

A. Overview
To investigate its behavior and performance, we have applied the proposed RKEM algorithm to the simulated low count and high count sinogram data, and compared it with the existing PET image reconstruction algorithms. Since the KEM algorithm [1], [2] has proven to outperform many state-ofthe-art PET image reconstruction algorithms using anatomical priors, we have mainly compared RKEM with KEM and MLEM-BW (MLEM with Bowsher regularizer derived from MR image prior). The regularizer used in MLEM-BW was where λ was a regularization parameter, and J i was the subset of the nearest neighbors most similar to x i learned from the MR image [13], [37]. We have also compared RKEM with commonly used MLEM plus Gaussian filtering (MLEM+F) to show some common problems of PET image reconstruction methods that are based on the un-regularized MLEM.
As discussed earlier, tuning J and σ to achieve simultaneously the best preservation of different types of lesions and the best noise suppression is generally difficult for KEM. While the kernel space composite regularizer V (a) in the proposed RKEM provides an effective technical means to overcome this difficulty. To demonstrate these, we have compared in our experiments four versions of KEM and three versions of RKEM as described below. KEM1, with smaller J and σ to show KEM's best performance in preserving PET-only lesion; KEM2, with larger J and σ to show KEM's best performance in noise suppression and preservation of MR+PET lesion; RKEM1 and RKEM2, with KEM1 and KEM2 regularized by the kernel space composite regularizer V (a), having its J R and σ R equal to J and σ of KEM1 in RKEM1 and equal to J and σ of KEM2 in RKEM2, to show the effect of full version RKEM with V (a) on simultaneous noise suppression and preservation of PET-only and MR+PET lesions, and to show the effect of J and σ on the performance of RKEM; RKEM-S, the same as RKEM1 but with only the kernel space graph regularizer v 1 (a), to show the effect of v 1 (a) on the noise suppression of KEM. We have not compared RKEM with only the kernel space energy regularizer v 2 (α) defined in (14), because it is a nonconvex function and cannot be used alone for convergent iterative optimization.
For KEM1 and KEM2, the simulated MR image prior described in Subsection III-B was used to construct their kernel matrix K . The kernel matrix K of KEM1 was used in RKEM1 and RKEM-S and that of KEM2 was used in RKEM2 as their respective kernel matrix K . For RKEM-S, RKEM1 and RKEM2, the kernel matrix K R in their kernel space graph and energy regularizers v 1 (a) and v 2 (a) was constructed from the pre-reconstructed PET image, which was obtained from the noisy PET sinogram data described in Subsection III-B using 30 iterations of MLEM, followed by Gaussian filtering.
For fair comparison of KEM and RKEM and also to show their performance difference when they both use the same MRI and PET priors, we have combined the same MR and PET image priors used in RKEM1 and RKEM2 to construct the kernel matrix K for another two versions of KEM: KEM-PET-MR1, with the same J and σ of KEM1, and KEM-PET-MR2, with the same J and σ of KEM2. The MR and PET image priors were combined the ith data vectors of PET and MR images, respectively, SSIM i the structural similarity index ofx i , and d i j substituting ∥x i −x j ∥ 2 in the kernel (5). These formulas are Equations (22) and (24) of [5], with the factor 3 in (22) adjusted to 1 to combine a single MR image with a single PET image from static PET imaging.

B. PET and Anatomical MRI Data Simulation
We used three-compartment model [38] to represent lesion, gray matter and white matter regions, and followed [2] to generate the regional time activity curve for each region, with the uptakes highest in lesion regions, lower in gray matter regions, and lowest in white matter regions. The 2D anatomical model from BrainWeb database consisting of different tissue classes was used to simulate the corresponding PET images [39], [40]. The regional time activities at 50 minutes were assigned to different brain regions to obtain a PET activity image of size 181 × 217, and two simulated lesions were added to this image. The resulting image was smoothed by a Gaussian filter with standard deviation 1 mm to simulate the point spread function of a typical PET scanner. The smoothed image as shown in Fig. 2(a) was used as the reference PET image, with the two simulated lesions shown respectively in Lesion 1 and Lesion 2 areas. A matching T1-weighted 2D MR image of size 181 × 217 and pixel resolution 1 × 1 mm 2 was generated from BrainWeb database, with a single simulated lesion at the same location as that of Lesion 1 in the reference PET image. To simulate the real scan process and test the robustness of different methods, noise was added to this MR image by BrainWeb. The MR image thus generated was used as the MR prior image datax shown in Fig. 2(b). In the sequel, we call Lesion 1 the PET+MR lesion and Lesion 2 the PET-only lesion.
The reference PET image was forward projected to generate noiseless sinograms using the PET system matrix generated by Fessler toolbox (https://web.eecs.umich.edu/∼ fessler/code/). The noiseless sinograms were generated using the transaxial field of view 70cm, 249 radial bins and 210 angular projections, then the Poisson noise was added to generate noisy sinograms. In addition, we included 20% randoms and scatters. The expected total numbers of events were set to 340k and 1133k to simulate the low count and high count cases, respectively. Attenuation maps, mean scatters and random events were included in all reconstruction methods to obtain quantitative images. Ten noisy realizations were simulated and each was reconstructed independently for comparison.

C. Algorithms Setup and Performance Metrics
The parameters for the compared reconstruction algorithms were as follows. MLEM+F: the window size of Gaussian filter = 5 × 5, σ = 1. MLEM-BW: the nearest neighbor The above parameters and the total iteration numbers were obtained from tuning the algorithms separately by the following rules. MLEM+F: for its best whole image MSE (mean square error) and reconstructions on PET+MR and PETonly lesions. All the kernelized algorithms: as described in Subsection III-A. MLEM-BW: according to [13] for its best whole image MSE and PET+MR lesion reconstruction.
were used for the quantitative performance evaluation and comparison of the algorithms. In these metrics, x j and f j are respectively the jth elements of the reconstructed and true PET images,x i lesion ,x i BG ,f lesion andf BG are respectively the mean pixel values of the reconstructed lesion, reconstructed background (white matter) area, true lesion and true background area, and S is the total number of noisy realizations.  C4 := Lesion shape distortion compared with the lesion shape in the true image. The lower the C1, C2 and C4, and the closer the C3 to that of the true image, the better the image quality.

D. Image Reconstruction Performance
As seen from Fig. 3, RKEM2 outperforms all other algorithms in all the four criteria, achieving simultaneously the lowest C1, the lowest C2 in both lesions, the C3 in both lesions closest to those of the true image, and the lowest C4 in both lesions, whereas RKEM1 falls far behind RKEM2 in all the criteria. Since RKEM1 uses the smaller J and σ of KEM1, RKEM2 uses the larger J and σ of KEM2 and J R = J and σ R = σ , it is apparent that the kernel parameters J and σ affect greatly the performance of RKEM. This will be further discussed later in Subsection III-F.
Comparing RKEM1 with KEM1 and KEM-PET-MR1, and RKEM2 with KEM2 and KEM-PET-MR2, it can be seen that RKEM1 and RKEM2 with the kernel space composite regularizer V (a) outperform their respective counterparts, achieving simultaneously better C1-C4. The better C3 in both lesion areas is due to the kernel space energy regularizer v 2 (a) that promotes the total energy of kernel coefficients a. While the simultaneously better C1, C2 and C4 is due to the kernel space graph regularizer v 1 (a) that smooths the kernel coefficient a and makes it conforming to the PET image prior embedded in v 1 (a).
It can also be seen from Fig. 3 that with the combination of PET and MR image priors in their kernel matrices, KEM-PET-MR1 and KEM-PET-MR2 outperform respectively KEM1 and KEM2 in C2 and C3, especially on PET-only lesion, which is in line with the result of [5]. However, they both fall behind their respective counterparts in C1 and C4, with higher whole image MSE and noticeable shape distortion in PET-only lesion, and both fall well behind RKEM2 in all the four criteria C1-C4. A probable reason for these is that the PET and MR combined prior is neither PET nor MR, which amounts to perturbing the PET-only and MR-only priors with noise, resulting in poorer C1 and C4. In contrast, the MR prior in RKEM2 is introduced through KEM's kernel matrix K and the PET prior through the kernel matrix K R of the kernel space composite regularizer V (a), so the original priors are unperturbed. Therefore, RKEM2 achieves better performance in all the four criteria with the aid of composite regularization and proper selection of kernel parameters.
Compared with KEM1 that was tuned to have KEM's best preservation of PET-only lesion, RKEM-S with only the kernel space graph regularizer v 1 (a) effectively suppresses the background noise of KEM1 image, but the PET prior in v 1 (a) also reduces the uptake and hence increases MSE in PET+MR lesion, while reduces MSE in PET-only lesion. In comparison to RKEM-S, RKEM1 with the additional kernel space energy regularizer v 2 (a) increases the uptakes and reduces MSE in both lesion areas, while reducing the whole image MSE of KEM1 image; the slightly higher whole image MSE is because the effect of v 1 (a) is weakened by v 2 (a). These results demonstrate the individual effects of v 1 (a) and v 2 (a), and the advantages of composite regularizer V (a) using PET image prior.
As seen from MLEM-BW image, with the regularizer derived from MR image prior, MLEM-BW significantly reduces the whole image MSE and makes image texture sharper and closer to MR image prior; it also significantly boosts the uptake of PET+MR lesion as compared to MLEM+F. However, it performs poorly on PET-only lesion due to the mismatch of MR prior and underlying PET images. As shown in [1], this is an inherent problem of MLEM-BW, which can be mitigated to some extent by tuning MLEM-BW parameters at the sacrifice of image variance and PET+MR lesion. Combining KEM with the PET image-based kernel space composite regularizer, RKEM2 overcomes this problem to achieve much better C1-C4. Note that RKEM2's blurry image texture is closer to that of true image, due to the PET only image prior in its composite regularizer, resulting in lower whole image MSE. As shown later in Fig. 10, RKEM2's texture can be made similar to that of MLEM-BW image if the PET image prior in its composite regularizer is replaced by MR image prior. Fig. 4 shows the images reconstructed by different algorithms from the simulated high count (1133k) sinogram data. Due to the high signal to noise ratio of high count data, all   algorithms perform significantly better. Nevertheless, RKEM2 still performs the best in terms of the criteria C1-C4. As seen from the plots, on the low count data, the regularized algorithms (RKEM2, RKEM1, RKEM-S, MLEM-BW) tend to converge monotonically along with iterations, with RKEM2 converging to the lowest AMSE, whereas those of un-regularized algorithms tend to diverge. The regularized algorithms behave the same on the high count data, while the other algorithms become less sensitive to the iteration  numbers, with reduced AMSE and the 'checkerboard' effect in this case. Fig. 6 plots the whole image mean squared bias versus the mean variance of different algorithms over the iterations 1 to 100. As seen from the figure, RKEM2 always gets much  lower bias and variance simultaneously than those of other algorithms for both low count and high count images. Fig. 7 plots CRC versus the standard deviation of background noise of different methods for Lesion 1 and Lesion 2, over the iterations 1 to 100. As seen from the figure, RKEM2 attains higher CRC with lower background noise than those of other algorithms.

F. Effects of Parameters and Prior Data Sources
Effect of the kernel parameters of RKEM has been shown in Subsection III-D by RKEM1 with smaller J = J R = 9 and σ = σ R = 0.22 and RKEM2 with larger J = J R = 13 and σ = σ R = 0.25. We found in our experiments that the J R and σ R of K R could be aligned with the J and σ of the kernel matrix K in KEM. The results of RKEM2 and also our trials on various kernel parameter selections suggest that a simple and good choice for RKEM's kernel parameters is to set J R = J and σ R = σ and make J and σ larger (as in KEM2) to achieve KEM's best noise suppression and PET+MR lesion reconstruction. Fig. 8 shows the effect of regularization parameter β on the low count image quality of RKEM2, with the RKEM2 image from Fig. 3 at the center. As seen from the figure, with the fine-tuned optimal β = 0.04, RKEM2 attains the best tradeoff between the uptakes in two lesion areas and the clearness of image texture. When β is below the optimal value β = 0.04, the kernel space graph regularizer v 1 (a) and the kernel space energy regularizer v 2 (a) are both weighted down. As a result, the image texture, as shown in Fig. 8 (left), becomes slightly sharper due to decreased smoothing effect of v 1 (a), also, the uptakes in two lesion areas become lower due to decreased pixel energy enhancement of v 2 (a). When β is above the optimal value β = 0.04, v 1 (a) and v 2 (a) are both weighted up. Hence, the image texture, as shown in Fig. 8 (right), becomes blurred due to stronger smoothing effect of v 1 (a) while the uptake in both lesions are over-boosted by the stronger energy enhancement of v 2 (a), resulting in higher whole image MSE and lesion MSE's. These results show that the regularization parameter β needs to be tuned properly. If it is too high or too low, the image quality of RKEM may degrade.
To quantitatively assess the effect of regularization parameter β, Fig. 9 plots the whole image mean squared bias versus the mean variance and the CRC's of Lesion 1 and Lesion 2 regions versus the standard deviation of background noise, calculated from the low count image constructed by RKEM2 at its convergence of 100 iterations, with the regularization parameter β varying from 0.01 to 0.09 in 0.01 increments. As seen from the plots, the optimal β = 0.04 used for RKEM2 in Fig. 3, also in Fig. 8 (center), gives a balanced tradeoff between the whole image mean squared bias and the mean variance and between the CRC of lesion regions and the background noise.
As detailed in Subsections II-C and II-E, the kernel space composite regularizer V (a) is derived from the kernel matrix K R , which can be constructed from the prior image data of different sources. To show the impact of different sources, Fig. 10 compares the images reconstructed by RKEM2 with V (a) derived from MR prior image data, named RKEM-MR, and by the orignal RKEM2 with V (a) from pre-reconstructed PET prior image data. As seen from the figure, RKEM-MR image's uptake in PET-only lesion is lower due to the lack of PET-only lesion information in V (a), also its texture resembles that of MLEM-BW image given in Fig. 3 due to the use of MR-only image prior in V (a).

IV. TESTS ON IN-VIVO DATA
To validate the results of simulation studies and further compare the performance of MLEM+F, MLEM-BW, KEM1, KEM2, KEM-PET-MR1, KEM-PET-MR2, RKEM1 and RKEM2, we have applied them to 2D in-vivo human data. Their settings and parameters were the same as those in simulation studies, except the regularization parameter of RKEM1 and RKEM2 was tuned to β = 4 × 10 −5 .
The PET in-vivo human data and corresponding MR image were acquired from a Siemens (Erlangen) Biograph 3 Tesla molecular MR (mMR) scanner. The human PET data were acquired with 233MBq of [18-F]FDG infusion over the course of 95 minutes at the rate of 36mL/hr and the average concentration about 2.45MBq/min (corrected to the acquisition start time). The 95 minutes list-mode data were binned into 95 3D sinogram frames each of 1-minute. A pseudo CT attenuation map was generated from the UTE map and was used to correct attenuation for acquired data [41]. T1 weighted MR image data were acquired at the scan time 30 minutes from a 16-channel RF head coil, with scan parameters TA = 7.01mins, TR = 1640ms, TE = 2.34ms, flip angle = 8 degrees, FOV = 256 × 256 mm 2 , voxel size = 1 × 1 × 1 mm 3 , 176 slices. For details of data acquisition and preprocessing, see [42].
The raw data of a slice of PET images at the 50 minutes of scan was rebinned into sinograms with 157 angular projections  and 219 radial bins, which were used for testing PET image reconstruction. MLEM with 20 iterations and post Gaussian filtering was applied to the sinograms to obtain the PET prior image for constructing the kernel matrix K R . The T1 weighted MR image was used as the prior image data to construct the kernel matrix K . Fig. 11 shows the MR prior image and images reconstructed by the compared eight algorithms. The following can be observed from careful comparison of these images: i) MLEM+F image appears noisy with pronounced granularity, especially in the areas with high uptakes. ii) With the BW regularizer derived from MR image prior, MLEM-BW smooths out the image and significantly reduces granularity. But it also reduces uptakes in some areas rather significantly and makes some areas similar to those of MR prior image. iii) With the kernel matrix constructed from MR prior image and without a regularizer, KEM1 and KEM2 smooth out the image globally without much loss of uptakes. KEM1 yields sharper texture with more details because its J and σ were smaller and tuned to achieve the best preservation of small objects (which was PET-only lesion in simulation studies). KEM2 image appears smoother than that of KEM1 with less texture details because its J and σ were larger and tuned to achieve the best noise suppression. iv) With the kernel matrix constructed from the combination of MR and PET prior images, KEM-PET-MR1 and KEM-PET-MR2 both have noticeably higher uptakes and sharper texture as compared respectively with KEM1 and KEM2. v) Equipped with the kernel matrix constructed from MR prior image and the composite kernel space regularizer V (a) derived from PET prior image, RKEM1 and RKEM2 significantly smooth out the image and enhance the uptakes everywhere. RKEM1 image appears sharper with more texture details than RKEM2 image due to its smaller J , σ , J R and σ R . Among the images of all algorithms, RKEM1 and RKEM2 images appear to have smoother texture, higher uptakes and contrast, and more pronounced anatomical structures, though it is impossible to judge which image is better since no ground truth available.
For quantitative comparison, we selected four grey matter areas with higher uptake as ROI regions and four white matter areas as background regions in each reconstructed images of Fig. 11, at the same locations indicated respectively by the red and black arrows in Fig. 11 and with the same area size 10×10. Fig. 12 plots the mean values of selected ROI areas versus the standard deviations of background noise in the images of compared algorithms. The plots are over the iterations 20 to 100 in the increment of 10. The plots show that at the same background noise levels, RKEM1 and RKEM2 always have higher ROI mean value than other methods, with RKEM2 having the highest. The background noise levels of KEM1 and KEM2 are similar to those of RKEM1 and RKEM2, respectively, while the background noise levels of KEM-MR-PET1 and KEM-MR-PET2 are between those of KEM1 and KEM2. These appear to be consistent with the results of simulation studies.

V. DISCUSSIONS
As seen from Fig. 5, KEM may behave similarly to MLEM with its AMSE increasing beyond the lowest value along with iterations, especially in low count reconstruction, but the AMSE and the increase rate may be much lower than those of MLEM due to the smoothing effect of kernel matrix K . These are consistent with the result of [11] and demonstrate the need for the regularized KEM, RKEM. The behavior of KEM-PET-MR1 and KEM-PET-MR2 shown in Fig. 5 suggests that building the kernel matrix K with the combination of PET and MR image priors may not change KEM's behavior in AMSE evolution, though the AMSE level may vary.
The simulation and in vivo data tests have shown that with proper parameter selection, RKEM outperforms MLEM+F, MLEM-BW and KEM, producing visually better PET images with clearer texture and higher uptakes and contrast, lower MSE's in both lesion areas, lower whole image AMSE, monotonic convergence, insensitivity to iteration number, simultaneously lower bias and variance, higher CRC with lower background noise, and higher ROI mean with lower background noise. These results clearly justify the use of RKEM to enhance KEM.
The better performance of RKEM stems from the four ingredients of the kernel space composite regularizer V (a). 1) The kernel space graph regularizer v 1 (a) smooths the kernel coefficient a, which in turn smooths out the reconstructed imagex to reduce image noise. 2) The kernel space energy regularizer v 2 (a) promotes the total energy of kernel coefficients a, which in turn promotes the uptakes and small objects, such as lesions, of the image. 3) The composition constant γ set by (17) guarantees V (a)'s convexity, which in turn leads to RKEM's monotonic convergence and insensitivity to the iteration number. 4) The PET prior image data used to derive V (a) contains PET-only information, which allows RKEM to better preserve PET image features such as PET-only lesions.
Although V (a) has v 1 (a) and γ v 2 (a) two parts, its strength is controlled by a single regularization parameter β, which is the same as in conventional RML (MAP) optimization (3), without additional hyper-parameter for regularization.
KEM's performance depends on its neighbourhood size J and kernel parameter σ . Since RKEM is a regularized version of KEM, its performance also depends on the J and σ of the underlying KEM and the additional J R and σ R of K R used in the kernel space composite regularizer V (a). To simplify the algorithm setup and parameter tuning, we have used J R = J and σ R = σ and tuned J and σ , as described in Subsection III-F, to achieve better performance over KEM.
RKEM2's better performance over KEM-PET-MR1 and KEM-PET-MR2 suggests that introducing MR image prior and PET image prior separately, through the kernel matrix K of KEM and the kernel matrix K R of the composite regularizer V (a), may be a better and flexible way to use the priors from different sources.
For simpler and clearer demonstration of RKEM's essence, we have focused on 2D PET image reconstruction in our simulation studies and in vivo data tests. For 3D application of RKEM, we need to overcome computational difficulty caused by the 3D projection equation that is in the same form of (2) but has huge dimension [43]. A simple solution is to convert the 3D projection equation to a set of slice-based 2D projection equations, and then apply RKEM to each 2D equation to reconstruct the 3D image slice by slice. This simple method has been used in non-kernelized 3D reconstruction [44] at the sacrifice of some reconstruction quality. A better solution is to incorporate RKEM with the fully 3D reconstruction algorithm [44] used in non-kernelized reconstruction. These are currently being investigated.
RKEM can also be used for dynamic PET image reconstruction. Similarly to [2], [5], and [11], we may exploit the temporal correlation of dynamic PET raw data to pre-reconstruct composite frames and use them to construct the kernel matrix K of RKEM. As the uptakes in the frames vary over time in dynamic imaging, we may use the pre-reconstructed frame at a time instant t and its neighboring frames to construct a K R (t). Then we use K and K R (t) to reconstruct the frame at time t with enhanced image quality.

VI. CONCLUSION
A novel kernel space regularization method, RKEM, has been presented for the recently emerged KEM method to avoid its possible large reconstruction error and high sensitivity to iteration number, and to enhance its image quality.
To overcome the difficulty in developing the globally convergent optimization algorithm for RKEM, we have used the concept and techniques of data manifolds and graph regularization to develop the kernel space composite regularizer consisting of the kernel space graph and the kernel space energy regularizers. Applying the optimization transfer technique to the kernel space composite regularizer, we have presented a globally convergent RKEM algorithm for PET image reconstruction.
The four ingredients of the kernel space composite regularizer, as discussed in Section V, are the power source of the proposed RKEM, which sets RKEM apart from the exiting kernelized methods and empowers it with better performance.
The results of simulation studies and in-vivo data tests have shown the better performance of the proposed RKEM over KEM and other conventional iterative methods in PET image reconstruction.