The SNR of Positron Emission Data With Gaussian and Non-Gaussian Time-of-Flight Kernels, With Application to Prompt Photon Coincidence

It is well known that measurement of the time-of-flight (TOF) increases the information provided by coincident events in positron emission tomography (PET). This information increase propagates through the reconstruction and improves the signal-to-noise ratio in the reconstructed images. Takehiro Tomitani has analytically computed the gain in variance in the reconstructed image, provided by a particular TOF resolution, for the center of a uniform disk and for a Gaussian TOF kernel. In this paper we extend this result, by computing the signal-to-noise ratio (SNR) contributed by individual coincidence events for two different tasks. One task is the detection of a hot spot in the center of a uniform cylinder. The second one is the same as that considered by Tomitani, i.e. the reconstruction of the central voxel in the image of a uniform cylinder. In addition, we extend the computation to non-Gaussian TOF kernels. It is found that a modification of the TOF-kernel changes the SNR for both tasks in almost exactly the same way. The proposed method can be used to compare TOF-systems with different and possibly event-dependent TOF-kernels, as encountered when prompt photons, such as Cherenkov photons are present, or when the detector is composed of different scintillators. The method is validated with simple 2D simulations and illustrated by applying it to PET detectors producing optical photons with event-dependent timing characteristics.


I. INTRODUCTION
T HE effective sensitivity of positron emission tomography (PET) can be increased with two complementary methods: (1) the number of acquired coincident pairs of photons can be increased by increasing the solid angle covered by the detectors, and (2) the information provided by each photon pair can be increased by improving the accuracy of the measured difference in their time of flight. To achieve the latter, detector experts are investigating many different detector designs. In some of those, the interaction of the 511 keV photon can produce a burst of optical photons with timing characteristics that vary from event to event. This is the case in detectors that make use of Cherenkov photons as well as scintillation photons [1], [2], [3], [4], and in so-called metascintillators, which are detectors created by combining scintillators with different characteristics [5], [6]. PET systems with such detectors do not have a single TOF-kernel but a distribution of TOF-kernels. If the front-end electronics cannot identify the correct TOF-kernel that corresponds to each event, then the TOF-resolution must be modeled by a probability weighted average of all possible TOF-kernels. In general, this average would not have a Gaussian shape. On the other hand, if the correct TOF-kernel can be assigned to each particular event, then the question arises as to how the effective sensitivity of such a PET system should be computed.
Consequently, it would be useful to have a metric that quantifies the effective sensitivity gain provided by particular distributions of TOF-kernels of known but arbitrary shape. In 1981, Tomitani addressed this question for a Gaussian TOF kernel [7]. He considered the 2D PET acquisition of a uniform cylinder, and computed the variance in the center of the reconstructed image for a particular spatial resolution. In 1975, Tanaka and Iinuma had done the same for non-TOF PET [8], enabling Tomitani to also compute the gain in variance, when comparing a PET system with a particular Gaussian TOF-kernel to an otherwise identical non-TOF system.
In this paper, we propose a method that extends Tomitani's results to non-Gaussian TOF-kernels. With this approach, one can analyze a PET system with a distribution of possibly non-Gaussian TOF-kernels, and compute the TOF-resolution of a corresponding conventional PET system (with a single Gaussian TOF-kernel) that would provide the same effective sensitivity.
We also show that the SNR-gains obtained for a simple signal-known-exactly background-known-exactly (SKE/BKE) detection task are almost identical to the SNR-gains of the reconstructed pixel values. This is useful, because computing the SNR-gain is much easier for the detection task. Interestingly, Tomitani's approach involves an approximation to keep the mathematics tractable, while no such approximation is needed for the detection task. Consequently, the small difference between SNR gains for these two tasks is equal to the effect of that approximation. Preliminary versions of this work have been published in [9], [10].

II. METHODS
This section starts with a short review of the SKE/BKE detection task. Section II-B shows that for this task, the SNR is the same for hot spot detection in the data as for hot spot detection in the image. The mathematical analysis is easier for detection in the data, but because of this equivalence, the results are relevant for detecting hot spots in the image, as is done clinically (albeit for more complicated detection tasks). Section II-C presents the computation of the SNR for hot spot detection, provided by a particular type of TOF measurement along a single LOR. Section II-D presents the computation of the variance in the center of a uniform cylinder as proposed by Tomitani. The analysis shows that the variances as computed by the Tomitani method are inversely proportional to the squared SNRs as computed by the hot spot detection approach. Therefore, both methods predict the same performance changes as a result of changes to the TOF-characteristics of the PET system. Finally, section II-E shows how the SNRs of different types of TOF events can be combined to compute the overall SNR of the TOF-PET system.

A. SNR of a Simple SKE/BKE Detection Task
In a SKE/BKE detection task, the observer knows the expectations of the signals and the backgroundb, and must decide based on a noisy data vector y if the signal was present or not. For that purpose, the ideal observer computes the likelihoods for y to be a noise realization ofb and ofb +s, and selects the most likely of the two. 1 If the noise on y is multivariate Gaussian with covariance matrix V independent of s, then the logarithm of the ratio of these two likelihoods equals (up to an irrelevant constant) where superscript T denotes the transpose [11]. The same result is obtained if the data y are corrupted by Poisson noise and the signal s is very small compared to the background b.
In this situation the log-likelihood ratio equals where the logarithm and the division are computed element by element. The last line holds becauses b and because for Poisson noise the data covariance V = diag(b). This result is equivalent to (1) becauses is independent of the data. The expected change of t, t, caused by presence versus absence of the signal equals where E (x) is the expectation of x. The variance on t equals Thus, the squared SNR for the observer decision equals

B. Equivalence of Detection in Image or Sinogram Domain
It is known that data processing does not change the ideal observer's performance when the operation is invertible (see [12], p. 855). Image reconstruction is not in general invertible, but as shown in appendix V-A, the same result holds for the minimum norm weighted least-squares reconstruction in a discrete linear problem. In other words, for the SKE/BKE detection task, one can either do the detection using the sinogram data, or using the (minimum norm WLS) reconstructed image; the same SNR is obtained in both cases. Consequently, in the following, the SKE/BKE detection task is analyzed in the data domain, where the noise is uncorrelated and the expressions (1) and (5) reduce to a weighted summation over all data bins.

C. The Detection SNR of an Event in PET
The aim of this paper is to compute the SNR for different types of PET measurements. The SNR is a feature of the probability distribution which models the measurement uncertainty. However, for convenience, we use the expression "SNR of an event" as a shortcut for the SNR of the probability distribution associated with this particular event type.
Below, the value of a detected coincident photon pair is quantified by computing its SNR 2 for detecting presence or absence of a small excess activity in a SKE/BKE detection task. This is done by applying (a continuous version of) equation (5) to non-TOF and TOF events with different TOFkernels. Using SNR 2 is more convenient than using SNR, because, as shown below, the SNR associated with multiple events is obtained by summing the squares of their SNRs.
In this analysis, only SNR 2 ratios and variances ratios are considered. To focus on the impact of different TOFkernels, we assume that factors such as detector sensitivity (normalization) and attenuation are constant and therefore ignore these factors, because they do not affect the SNR ratios.
1) The SNR 2 of a Non-TOF Event: Consider a line-ofresponse (LOR) intersecting an object with an activity B(x) per unit length along the LOR. 2 In the center (x = 0), a hot spot with activity S may be present or not; when present, this hot spot adds an activity Sδ(x) per unit length, where δ(x) is Dirac's delta distribution (see figure 1). The TOF profile as measured along a central LOR, which is a blurred version of the true profile (copied from [9],CCA 4 license).
Ignoring scatter and randoms, the expected count when the hot spot is absent equals ρ −ρ B(x)dx, where ρ = wc/2 is the acceptance radius defined by the coincidence window w of the PET system (and the speed of light c). Since it is assumed that ρ is large enough to cover the object, ρ can be set to infinity when randoms are ignored. When the hot spot is present, the expected count increases with S. Assuming, as in section II-A, that S ∞ −∞ B(x)dx and that the data are corrupted by Poisson noise, applying (5) produces When the activity along the LOR is uniform, as is the case for Tomitani's uniform cylinder, the equation reduces to where B is the activity per unit length and D is the diameter of the cylinder.
2) The SNR 2 of an Event With TOF Kernel k(x): The same setup as above is considered, but now the system also measures the TOF difference of the two photons. The uncertainty of that measurement is given by the TOF-kernel k(x). The expectation of the TOF-measurement along the LOR is a one dimensional profile, which in real PET systems is sampled with finite bins. For convenience, we represent it as a continuous function here, equivalent to taking the limit of the TOF-binsize to zero. The TOF-profile y(x) has the expectations Consequently, the expectation of the signal equalss(x) = Sk(x). Applying a continuous version of (5) produces for the detection SNR For a uniform cylinder with activity B per unit length, this becomes where it was assumed that the TOF kernel is much smaller than the diameter of the cylinder, such that the denominator in (9) is constant where k(x) differs from zero. Equations (9) and (10) are the main equations in this paper. They provide the SNR contributed along a single LOR for a particular TOF-kernel.
To compute the system SNR, the SNRs of all LORs and all TOF-kernels have to be combined, by summing the squared SNRs, as explained in section II-E. Dividing (10) by (7), one obtains for the SNR 2 gain achieved by TOF Below we consider only the uniform background, we use equation (10) to compute the detection SNR 2 for different TOF-kernels, and we show that (11) generalizes Tomitani's result for the variance ratios in the reconstructed image. For sufficiently poor TOF-resolution, the TOF-kernel k(x) becomes much wider than the diameter of the cylinder. In that situation, equation (9) simplifies to which is equal to the result for the non-TOF SNR 2 of (7).
3) The SNR 2 of an Event With Gaussian TOF Kernel: For a Gaussian TOF kernel with standard deviation σ D, the SNR 2 of (10) equals Combining (13) and (7) to compute the variance gain of TOF-PET versus non-TOF PET (for the center of a uniform cylinder), one obtains where FWHM is the full width at half maximum of the TOF-kernel. This is identical to Tomitani's result for the variance gain of TOF-PET versus non-TOF PET for the reconstructed pixel value [7]. 3 As shown in [9] and [10], equation (13) is easily extended to include randoms or some background non-uniformity. The randoms contribution is independent of the TOF index. Denoting with R the randoms contribution per unit length, the expected random count for the non-TOF acquisition equals R 2ρ, where ρ is the acceptance radius, and the SNR gain becomes with β = D/(2ρ). Since β < 1, (15) predicts that compared to non-TOF-PET, TOF reduces the variance more when the randoms fraction is higher, as has been observed e.g. in [13].
4) The SNR 2 for a Sum of Gaussians TOF Kernel: Although a common assumption, TOF kernels do not have to be Gaussian, and they are typically not Gaussian in detectors in which optical photons with different timing characteristics are produced [14]. Efthimiou et al. model the timing accuracy of detecting scintillation and Cherenkov photons in BGO as a Gaussian mixture [15]. Since TOF-kernels are non-negative and expected to be smooth, it seems likely that the TOF kernels of current and future detectors can be well modeled as a sum of possibly shifted Gaussians: where σ i is the standard deviation, x i is the offset and A i σ i / j A j σ j is the weight of Gaussian i , i.e. the probability that the event belongs to i . Inserting this in (10) results in 5) The SNR 2 for a Cauchy TOF-Kernel: Some authors argue that TOF-kernels typically have wide tails and are better modeled with a Cauchy distribution (Lorentzian) than with a Gaussian [2], [16]. Similar to the Gaussian, the Cauchy distribution has a predefined shape; its two parameters, x 0 and γ , determine its position along the x-axis and its width. It is defined as This distribution does not have a well-defined mean, but the Cauchy principal value of the mean equals 0. Its higher moments are also not well defined, its variance is infinite. With help from [17], one can verify that Consequently, the SNR 2 of the Cauchy TOF-kernel equals For comparison, the corresponding computation for the Gaussian kernel is For the same FWHM, the SNR 2 of the Gaussian kernel is about twice better than that of the Cauchy kernel. This makes sense, because the Cauchy distribution has much wider tails.

D. Reconstruction Variance in the Center of a Uniform Cylinder
In [7], Tomitani considered the variance in the center of the image of a uniform cylinder, reconstructed with 2D filtered backprojection (FBP). He showed that for TOF-PET, this variance is minimized when the backprojection in FBP uses "confidence weighting", which means that the backprojection is weighted by the TOF-kernel, and therefore equals the adjoint of the forward projection.
He then computed the noise propagation through this optimal TOF-FBP algorithm, to obtain the variance of the reconstructed pixel value in the center of the image (of the uniform cylinder). Previously, Tanaka and Iinuma had done a similar study for non-TOF PET [8]. Combining both results, he obtained the variance reduction achieved by TOF; his result is identical to equation (14) above.
In appendix V-B, Tomitani's approach is extended to a TOF-kernel of arbitrary shape. The resulting expression for the variance ratio is This result is identical to expression (11) for the SNR 2 ratio as computed with the SKE/BKE detection task. This means that a TOF-kernel optimized for hot spot detection would also be optimal for noise suppression in the reconstructed image. However, in these variance calculations, an approximation is introduced to obtain an analytic solution (see equations (39) and (40) in the appendix). No such approximation was used in the derivation of the squared SNR in (17). Consequently, the SNR gain achieved by TOF for the detection experiment is very similar to, but slightly different from the variance gain obtained for the reconstructed pixel value. The difference between the two is equal to the effect of the approximation.

E. The Combination of Events With Different SNR 2
The equations above show how the SNR 2 of a single TOF event can be computed. PET systems based on metascintillators will detect events of different types. If the system would be able to assign the correct TOF-kernel to each event, then that TOF-kernel can be used during image reconstruction. In such a setting, events would contribute a different amount of information to the final reconstruction, depending on their TOF-kernel. To predict the performance of such a PET-system, we verify here how different events contribute to the overall SNR.
Returning to the SKE/BKE detection task of section II-A, suppose that the data vector consists of two sub-vectors y 1 and y 2 , with covariance matrices V 1 and V 2 , so that y = [y 1 , y 2 ] T . Let s 1 and s 2 denote the corresponding components of the signal to be detected, i.e. s = [s 1 , s 2 ] T . If y 1 and y 2 are independent, the covariance matrix of Then the SNR 2 in equation (5) becomes Extension to a sum of more than two measurements is straightforward. Thus, assuming that the reconstruction algorithm makes optimal use of the data, the SNR 2 of a set of independent events is the sum of the SNR 2 of each event. Accordingly, the mean SNR 2 over a distribution of events equals the sum of the SNR 2 of each type of event, weighted by its relative abundance.
The above can be applied to assign an equivalent conventional TOF resolution to PET-systems with non-Gaussian TOF-kernel(s). If a PET-system has events with different TOF-kernels, then the average value of the PET-events depends on whether the system can label each event with its own TOF-kernel or not. If not, then during image reconstruction, the average TOF-kernel must be used for all events. This TOF-kernel is a weighted average over all TOF-kernels, each weighted by the probability of its events. The average SNR 2 of the events is then computed with (10). On the other hand, if the system can assign the appropriate TOF-kernel to each event, then that TOF-kernel can be taken into account when the event is used during image reconstruction. In that case, the SNR 2 of each event type can be computed with (10), and these SNR 2 can be combined by computing their weighted average, where the weights equal the probability of each event type. Since the quadratic function is convex, labeling the events with their TOF-kernel increases the expectation of the SNR 2 computed over all events: where k j (x) denotes a TOF-kernel and p j the probability of the corresponding event type, with j p j = 1. Once the average SNR 2 over all possible events of a particular PET system is known, the equivalent conventional TOF-resolution can be computed from (13) as In this calculation, the constant S 2 /B will cancel out, since this constant is present in all SNR 2 calculations (see (7), (10), (13) and (17)).

III. EXPERIMENTS
The analysis by Tomitani and our extension of his approach are based on 2D filtered backprojection (FBP). It can be shown that FBP is a good approximation of unweighted least squares reconstruction, see e.g. [18]. For the center of a uniform cylinder, all (central) projections are identical, implying that in a weighted least squares reconstruction, all weights would have to be identical. Therefore, in this case, FBP is also a good approximation of a weighted least squares reconstruction. Since maximum likelihood expectation maximization (MLEM) reconstruction closely resembles a weighted least squares algorithm, our results obtained for the center of a uniform cylinder should hold with good approximation for MLEM too. Because MLEM is the algorithm of choice in clinical applications, it was used for all reconstructions in our simulation experiments.

A. The Accuracy of Tomitani's Approximation
As mentioned above, Tomitani intended to compute the variance gain on the reconstructed pixel values obtained by TOF, in the center of a uniform cylinder, and for a particular spatial resolution. To keep the mathematics tractable, he introduced an approximation, see equation (39). 4 The variance gain of TOF versus non-TOF computed with this approximation is identical to the SNR 2 gain obtained for the detection task. It follows that the SNR 2 gain for the two tasks is similar but different, and the difference equals the effect of the approximation. This effect is evaluated numerically by comparing the analytic expression (42) to a calculation using numerical integration of (37) avoiding approximation (39).
B. 2D Simulations 1) Experiment 1: To verify the results obtained above, five PET systems with the same geometry but different event types were simulated. The systems had either 1) non-TOF events 2) Gaussian TOF kernel events of 70 ps FWHM 3) Gaussian TOF kernel events of 400 ps FWHM 4) an equal probability for the two TOF events above, and for each event, the associated TOF kernel was known 5) the same two TOF events as above, but individual events could not be assigned to a particular TOF kernel. These parameters were not meant to be realistic, the only aim of these simulations was to verify the equations. The simulations were run twice. In the first simulation, the 70 ps and 400 ps TOF kernels had a zero time offset. In the second simulation, the 70 ps TOF kernel had an offset of 50 ps and the 400 ps TOF-kernel had an offset of -200 ps. These offsets are expected to change the value of the SNR 2 only for the last case, where two TOF events occur while the system cannot determine the type of individual events.
Simple 2D simulations were performed for these systems. An image of 100 × 100 pixels, with pixel size of 2.5 mm × 2.5 mm, containing a uniform disk with diameter of 200 mm was generated. The field of view of the simulated PET system was circular with a diameter 450 mm. The sinogram had 100 equidistant LORs per angle, a pixel size of 2.5 mm and 100 angles uniformly spread over 180 degrees.
Five noise-free measurements were created. Three were produced by forward projecting this image with a non-TOF projector, a projector with 70 ps and one with 400 ps TOF-resolution. Two additional measurements were produced, one by combining equal fractions of unlabeled events with 70 and 400 ps TOF resolution, and a second one where each event was labeled with its TOF uncertainty. All sinograms had 5 × 10 6 events. For each sinogram, 100 (Poisson) noise realizations were produced. The sinograms were reconstructed with MLEM in combination with a non-TOF projector, a Gaussian TOF projector, non-Gaussian TOF projector (eq. (16)) or a combination of two Gaussian TOF projectors, as required for optimal reconstruction of the respective sinograms.
From these simulations, SNR 2 gains were computed with (13) for the Gaussian kernels and with (17) for the unlabeled mixed events. For the labeled events, the two SNR 2 computed with (13) were averaged. SNR 2 gains will translate into variance gains, if the reconstructed images have identical spatial resolution (σ p in (38)). To achieve that with good approximation, a high number of iterations was applied, and the resulting images were post-smoothed to suppress the effect of residual resolution differences caused by differences in convergence. For the non-TOF reconstruction, 400 iterations were applied. Considering that the TOF-reconstructions converge much faster, they were done with 200 iterations. All reconstructions were post-smoothed with a Gaussian kernel of 3 pixels (7.5 mm) FWHM. An image of pixel variances was computed for each case from the 100 noise realizations. The mean variance value in a central region of interest with diameter of 60 mm was computed for each case (this region contains 462 pixels). The results of the 100 noise realizations were then used to estimate the mean variance value and the error on that estimate.
2) Experiment 2: A similar simulation was done to evaluate the predicted TOF-gain in the presence of randoms (eq. 15). Non-TOF and 200 ps FWHM TOF events were simulated. The radius of the coincidence window was ρ = 16.15 cm, and therefore β = 0.619 in (15). The simulation parameters were as above, but the number of non-TOF iterations was increased to 800 because the presence of the randoms slows down convergence, in particular for non-TOF. In addition, in contrast to the first experiment, here a finite system resolution was modeled with an image-based, shift invariant Gaussian point spread function of 3.75 mm FWHM. Finally, the noise amplitude was decreased by increasing the total counts to 2.5× 10 8 , so as to limit the impact of the non-negativity constraint enforced by MLEM.
3) Experiment 3: Finally, a simulation experiment was performed to verify the predicted TOF-gains for SKE/BKE detection. All simulation and reconstruction parameters were the same as in experiment 1. Noise-free simulations with and without a single pixel hot spot (10% activity increase) in the center of the image were created, together with 10000 noisy sinograms without the hot spot. The detection SNR 2 was computed from the sinograms. This was done by estimating the mean and variance of the matched filter output, when applied to the noisy sinograms. The matched filter iss T V −1 , see equation (1). The sinograms was computed as the difference between the two noise-free sinograms, with and without hot spot. Because the sinogram values are subject to Poisson noise, their variance equals their expectation: diag(V ) =ȳ, whereȳ is the noise-free sinogram without the hot spot and V is the (diagonal) covariance matrix. Because the hot spot is very small and only 10% more active than the background, we ignore its effect on V . Pixel by pixel division of sinograms by the variance produces the matched filters T V −1 =s/ȳ. This filter was applied to each noisy sinogram y by computings T V −1 y = is i y i /ȳ i . This produces 10000 noisy matched filter outputs, from which the variance is computed. The expected signal of the matched filter was computed ass T V −1s = is 2 i /ȳ i . Consequently, the SNR 2 for sinogram detection was computed as where superscript (k) denotes the Poisson noise realization. Then, images were reconstructed from 800 noisy sinograms (without hot spot) and the analysis was repeated for detection in the image. The matched filters T x F was applied to the noisy reconstructions, where we assumed that the Fisher information matrix F is an acceptable approximation of the inverse of the image covariance (see appendix V-A). This assumption is not correct, because the image has been postsmoothed with a Gaussian of 3 pixels FWHM. Accounting for that is difficult, we simply ignore it, hoping that the resulting imperfect prewhitening does not affect the (relative) detection performance too much. To further simplify the computations, we assume that the Fisher information matrix is locally shift invariant [19], [20]. This implies that multiplication with the Fisher information matrix can be well approximated as a convolution with the central row of the Fisher information matrix F (represented as a 2D image). This local Fisher information matrix was computed as where e c is an image set to zero everywhere except for the central pixel, which was set to unity, A is the system matrix (see appendix V-A) and, as before, the diagonal of covariance matrix V is the noise-free sinogram without hot spot. Images x was the difference between the noise-free reconstructions with and without hot spot, and the matched filters T x F was obtained by convolving the image Fe c withs x . Thus, the SNR 2 for detection in the image was computed as where ⊗ represents 2D convolution and x (k) is the k th noisy reconstruction.

C. Application to Metamaterial Detector Events
The proposed method was applied to assess the value of PET systems based on metascintillators. The metascintillators are constructed by interleaving thin plates of a dense scintillator with thin plates of a fast scintillator. The dense scintillator provides high stopping power, the fast scintillator provides a short scintillation time but its stopping power is lower. The energy of the incoming gamma ray is dissipated in the detector through photon-electron interaction (photoelectric or Compton) producing one or a few recoil electrons. In a typical interaction, the recoil electron will deposit part of its energy in the dense scintillator and part in the fast one.
Several designs have been studied with Monte Carlo simulations, in which LYSO or BGO were combined with EJ232 or BaF 2 , in layers with thicknesses ranging from 50 to 300 μm [21]. Here two designs are considered: a combination of BGO and EJ232 and a combination of LYSO and EJ232. 5  The simulation was done for 3 × 3 × 15 mm 3 metapixels and computed the fraction of the energy deposited by an incoming 511 keV photon in each of the materials. For each deposited fraction, the detection time resolution (DTR) was estimated, assuming a Gaussian uncertainty. The simulator examined, in steps of 1 keV, all possible energies deposited in the dense scintillator; the rest of the energy was deposited in the plastic scintillator, because only events without escaping scattered photon were considered. The simulator computed the probability of such event and its DTR for each of these 512 cases.
The resulting DTR-values ranged from 40 ps FWHM, when all the energy was deposited in EJ232, to 283 ps for an almost pure BGO scintillation and 151 ps for an almost pure LYSO scintillation. These DTR distributions are shown in figures 2 and 3. As indicated in the figures, three modes are expected: slow events with all energy dissipated in BGO, fast events with all energy deposited in the plastic, and mixed events with energy deposited in both scintillators. However, the fast mode is invisible in the figures, because its probability is negligibly small. It becomes much more visible if the plastic is replaced by a denser fast scintillator, such as BaF2 for instance.
From the DTR distribution, the corresponding coincidence time resolution (CTR) distribution is obtained by considering all possible DTR combinations, multiplying their probabilities and computing CTR i j = DTR 2 i + DTR 2 j . Having N = 512 different DTRs, a set of (N 2 + N)/2 different CTRs is obtained. Because the DTRs are assumed to be Gaussian, so are these TOF-kernels. If we assume that the events can be labeled with their individual resolution, each event provides an SNR 2 given by (13), and the weighted average over all events produces the average SNR 2 for the entire TOF-PET system. If the events could not be labeled, the system SNR 2 was computed with (17).

D. Application to a Mix of Scintillation and Cherenkov Photons
Similar to previous work [4], coincidence detection times were studied by the same group with Monte Carlo simulations (confirmed by measurements), for 2×2×3 mm 3 BGO-crystals. By using fast silicon photomultipliers (SiPM) and fast readout electronics, it is possible to trigger the detector on the prompt Cherenkov signal produced in BGO as a result of a 511 keV interaction. However, the number of Cherenkov photons is very small, and there is a strong variation in the number of detected Cherenkov photons. Because the timing accuracy is better when more Cherenkov photons are detected, there is also a strong variation in the timing accuracy of the detected events. In [4], the detector events were arbitrarily divided in 5 groups with equal probability, by using the signal rise time of the SiPM. Since a shorter rise time indicates a larger number of Cherenkov photons, and therefore an event with better timing resolution, grouping the events based on their rise time creates groups with different timing resolutions. Accordingly, the coincidence events were divided in 5 × 5 categories, i.e. all possible pairs of the 5 detector event groups. For each category, a sum of two Gaussians was fitted to the TOF-kernel. The narrow Gaussian had probability P 1 and FWHM F 1 , the wider Gaussian had probability P 2 = 1 − P 1 and FWHM F 2 . In addition, there was an offset between the two fitted Gaussians. The results are given by the following matrices, where the offsets and FWHMs are given in ps: The P 1 , F 1 and F 2 matrices have to be symmetrical, the offset matrix has to be antisymmetrical. This symmetry was imposed The ratio of the variance computed analytically (detection task) and the variance computed using numerical integration of (37) (reconstruction task), as a function of the ratio σ/σ p , for three different TOF-kernels. The TOF-kernel width is defined by σ, the resolution of the reconstructed image by σ p . Since this plot shows the ratio of the variance computed without and with Tomitani's approximation, it quantifies the accuracy of that approximation (equations (39) and (40)).
to reduce the noise. Consequently, 15 different categories of coincidence events are obtained. From this information, the SNR 2 of the average labeled and unlabeled event was computed. For the unlabeled events, a weighted sum of all 15 different TOF kernels (30 Gaussian) must be computed. To do so, the possible offsets between the 15 event types must be known. Because these offsets were not available, we have assumed that for each event type, half of the offset should be assigned to the narrow Gaussian and the other half to the wider Gaussian.

IV. RESULTS
A. The Accuracy of Tomitani's Approximation Figure 4 shows the ratio of the variance calculated analytically using Tomitani's approximation and the variance computed with numerical integration (avoiding the approximation), as a function of the ratio between the TOF-resolution σ (expressed in mm) and the resolution of the reconstructed image σ p . As mentioned above, the analytic variance leads to the same TOF/non-TOF gain as the hot spot detection task, while the numerical computation produces the image reconstruction variance (for image resolution σ p ). The calculations were done for three different TOF-kernels: a Gaussian with TOF resolution σ , a sum of two concentric Gaussians with resolutions σ and 3 σ , and a sum of two Gaussians with resolutions σ and 1.5 σ and an offset between the Gaussians of 1.5 σ . The two variances are virtually identical when σ > 10σ p . When σ is similar or slightly larger than σ p , the analytic variance (hot spot detection) is up to about 8% higher than the numerical value without Tomitani's approximation (reconstruction variance). When the TOF resolution is (much) smaller than the reconstruction resolution, the detection variance becomes (much) smaller than the reconstruction variance.

B. 2D Simulations
The results of the simulation experiment with different TOF kernels are shown in table I. Table II lists the results from the randoms simulation experiment (experiment 2). Table III   TABLE I  RESULTS OF THE TOF KERNEL SIMULATION EXPERIMENT   TABLE II  RESULTS OF THE RANDOMS SIMULATION EXPERIMENT   TABLE III  SNR 2 GAINS IN THE DETECTION SIMULATION EXPERIMENTS compares the analytically predicted SNR 2 gains to those obtained in the two detection experiments. With N = 10000 noise realizations the relative error (standard deviation divided by the mean) on the variances should be √ 2/(N − 1) 1.5%, and 2% on variance (or SNR 2 ) ratios. For N = 800, these relative errors are 5% and 7% respectively. The tables show a good agreement between the observed and predicted gains.

C. Application to Metamaterial Detector Events
The coincidence time resolution (CTR) distributions derived from the detector time resolution (DTR) distributions are shown in figures 5 and 6. The two distinct modes of the DTR distribution (slow and mixed) give rise to three modes in the CTR (slow-slow, slow-mixed and mixed-mixed), which can indeed be identified in the plots. For the BGO/EJ232 detector, the modes are well separated, while for LYSO/EJ232 they are overlapping because of the shorter scintillation time of LYSO. Table IV shows for each detector the best and worst time accuracy. The best timing resolution is achieved for the (very rare) cases where in both detectors, all the energy was deposited in the fast scintillator. The worst timing resolution is obtained when almost all of the energy was deposited in the slow scintillator (BGO or LYSO). As mentioned above, it was assumed that the uncertainty on the timing for a particular detector event can be well modeled as a Gaussian. Consequently, the corresponding CTR is also Gaussian. If the coincidence events cannot be labeled with their individual timing resolution, then all events have to be considered as drawn from a single probability distribution, the average TOF-kernel (section II-E), which will be non-Gaussian. Table IV gives its equivalent Gaussian TOF resolution, computed with (17) and (25).
On the other hand, if for each coincidence event the individual timing resolution is known, then the reconstruction can make use of it. Table IV gives the equivalent Gaussian TOF resolution. The result confirms that labeling the events with their individual timing resolution improves the performance of the PET system (i.e. it decreases the equivalent TOF FWHM). For BGO, labeling the events improves the SNR 2 by 13%. For LYSO, the improvement is only 3.5%.

D. Application to Mix of Scintillation and Cherenkov Photons
For each of the 15 event types, the SNR 2 and equivalent conventional TOF resolution (FWHM in ps) were computed:  of the individual SNR 2 values. That results in an SNR 2 of 4.86, which corresponds to a standard Gaussian TOF event of 182.2 ps. If the events cannot be labeled, all 15 TOF kernels (30 Gaussians) must be combined in a single TOFkernel (equation (16)). Equation (17) then produces an SNR 2 of 4.08 and an equivalent standard TOF kernel width of 217 ps. Consequently, labeling the events improves the SNR 2 by 19%.

V. DISCUSSION
Although equation (9) gives the SNR 2 for detecting the hot spot in any background along a single LOR, we have only considered the special case of a uniform background, which simplifies the equation to (10). The reason is that for this simple background, it is possible to compute also the SNR 2 of the reconstructed pixel values using Tomitani's approach. As a result, we currently do not know if the excellent agreement between the detection SNR 2 and the SNR 2 of the reconstructed pixel values also extends to non-uniform backgrounds. Interestingly, for comparing two TOF-PET systems, the simplified equation (10) does not require the entire object to be uniform. It only requires that one considers a region of uniform activity, where the size of the region is large compared to the TOF resolution. Since the TOF-resolution continues to improve, the clinical relevance of the proposed approach, based on equation (10), should improve accordingly.
In experiments 1 and 3, no detector PSF was simulated, whereas in experiment 2 a Gaussian PSF of 3.75 mm FWHM was applied during simulation and reconstruction. Also in several simulation experiments not shown here, we found that modeling the PSF had no noticeable effect on the observed SNR 2 -gains. No significant effect was expected: because the PSF model is in the direction perpendicular to the LOR whereas the TOF-profile is along the LOR, the two are literally orthogonal.
When the timing resolution associated with each event is known, the events provide more information. In the simulation experiment, where a mixture of events with timing resolutions of 70 ps and 400 ps was considered, labeling the events improved the SNR 2 by 41% when the two Gaussians were concentric, and by 74% when there was a large offset between them (table I). More modest improvements were observed for the more realistic cases. For the simulated mixture of Cherenkov and scintillation photons in BGO, the improvement was 19%; for the metamaterial simulations the SNR 2 improvements were 13% for BGO/EJ232 and 3.5% for LYSO/EJ232.
In our analysis, we only considered how the value of a single event depends on the TOF-kernel. In real life, different detectors will not only have different TOF kernels, but usually also different stopping power, transparency, light yield, energy resolution etc. When TOF-systems are compared, all these effects have to be taken into account to quantify their effective sensitivity.
Tables I -III reveal good agreement between the SNR 2 gains predicted by the theory and those obtained from the simulation experiments. The good agreement between the detection SNR and the SNR of the reconstructed pixel values is fortunate. First, it simplifies the analysis, because analyzing the detection SNR is easier than propagating the data noise through the reconstruction equations. In addition, this result predicts that optimizing detector characteristics for hot spot detection will also make them (almost) optimal for image reconstruction. Figure 4 analyzes the agreement between the detection SNR and reconstruction SNR in more detail. The reconstructed SNR is computed for an image reconstructed with spatial resolution σ p . This finite resolution is needed because otherwise the image variance becomes infinite. In contrast, for the detection task, the observer uses a matched filter, which produces a finite variance without any resolution limitation. The plot of figure 4 shows that when the TOF-kernel width σ is much larger than the reconstructed image PSF σ p , both SNRs are virtually identical. When the TOF-kernel width σ becomes much smaller than σ p , the reconstruction task benefits less than the detection task from a further TOF improvement. In this situation, the numerator P 2 (R) in (36) starts acting as a low-pass filter, suppressing high frequencies R, which makes the Tomitani-approximation (see appendix V-B) less accurate. A more intuitive explanation is that the finite reconstruction resolution smooths away the gain obtained by further TOF improvements. This is not the case for the detection task, where the matched filter is always adjusted to the TOF resolution, optimizing the detection of the infinitesimal hot spot. Remarkably, for intermediate TOF-resolutions that are similar to or slightly larger than the reconstruction resolution, the reconstruction task benefits more than the detection task from improvements of the TOF resolution. We currently do not have an explanation for this observation; it may be caused by the effect of TOF on the noise covariances.
The equivalence between the SKE/BKE SNR 2 gain and the reconstructed image variance gain only holds for a complete PET system, where "complete" means that the system provides sufficient (or even redundant) sampling. We implicitly assumed that the only problem of the system is the noise, implying that the system produces "perfect" images if a sufficient number of events can be acquired. Consequently, the analysis only studied how the TOF-kernel affects the effective sensitivity of the PET system. TOF can be used for other tasks than variance reduction. It can be used to suppress limited angle artifacts [22], and to estimate the attenuation from the emission data, eliminating the need for measuring the attenuation with additional hardware [23]. It is conceivable that TOF kernels that are optimal for variance reduction may not be optimal for these other applications, more research will be needed to clarify this.
The predicted SNR can only be achieved if during reconstruction, the TOF-kernel is modeled exactly. It is known that using an unmatched TOF-kernel during reconstruction produces artifacts, but fortunately, experience shows that small modeling errors only produce small artifacts [24]. However, this modeling error may also adversely affect the noise propagation. It could be useful to extend the methods discussed in this paper to predict the SNR also for situations where the true and model TOF-kernel are not exactly matched, as is always the case in reality.

A. Equivalence of Detection in Data and Image Domain
When using the data for detection, the task is to decide between E (y) = Ab + As (signal present) and E (y) = Ab (signal absent), from a noisy measurement y with covariance V , where A is the projection matrix and the covariance V is assumed independent of the presence of the signal. The prewhitening matched filtering test statistic equals t = (As) T V −1 y, wheres is the signal in the image and the superscript T denotes the transpose. For the signal and variance of the test statistic one finds: E ( t) = E (t|s present) − E (t|s absent) (29) =s T A T V −1 As =s T Fs (30) var(t) = (As) T V −1 V V −1 As =s T Fs, (31) where F = A T V −1 A is the Fisher information matrix, assuming Gaussian noise. Therefore, the SNR 2 for this task equals SNR 2 data = (E ( t)) 2 var(t) =s T Fs.
The minimum norm weighted least-squares reconstruction equals x † = F † A T V −1 y, where F † is the pseudoinverse of F. When detecting on the image, the task is to decide between x =b +s (signal present) and x =b (signal absent) from a noisy image x with covariance (33) Denote with η the filter used to compute the test statistic, i.e. t = η T x † . The squared SNR equals Maximizing the expression in the RHS yields the prewhitening matched filter η = Fs. Consequently, SNR 2 image =s T Fs where we used F F † F = F. This shows that the SNR for the SKE/BKE detection task is the same when the detection is done using the data or using the minimum norm weighted least squares reconstructed image.

B. Reconstruction Variance in the Center of a Uniform Cylinder for an Arbitrary TOF-Kernel
Tomitani considered the variance in the center of the image of a uniform cylinder, reconstructed with 2D TOF filtered backprojection (FBP) using confidence weighting. His method is based on equation (11) of [7]: where V is the variance in the center of the image, a is the emission count in cm −2 , P is the Fourier transform of the image PSF p (imposed by post-smoothing with a Gaussian), K is the Fourier transform of the TOF kernel k and R is the rotational mean operator as defined in (1) of [7]: Assuming that the FBP image is postsmoothed with a Gaussian with standard deviation σ p , P(R) equals P(R) = e −2π 2 σ 2 p R 2 .
(38) Since K has the shape of a low pass filter, the same is true for K * K . Its contributions to integral (37) will therefore be highest when cos θ is close to zero, in particular when R is large. Because the integral in (36) is to infinity, and the integrand is weighted by R, assuming that R is large seems sensible. The approximation will be better when the resolution of the reconstructed image is higher, because then P(R) will extend to larger R. Consequently, we can replace cos θ by its linear approximation near π/2: cos θ π/2 − θ . Thus, we obtain: (41) Extending the integration interval to infinity is justified by considering that the contributions become negligible for large θ . The last equality follows from Parseval's identity. Inserting this approximation in (36) and using (38) one obtains where we used ∞ 0 e −u √ u du = √ π/2. Tomitani combined this with the expression for the non-TOF variance obtained in [8], given by where D is the diameter of the uniform cylinder. Consequently, one obtains for the variance ratio