Kullback’s inequality and Cramer Rao bounds for point process models

A lower bound on the Kullback-Leibler divergence known as Kullback’s inequality can be determined with the Legendre transform of the cumulant generating function. The Cramer Rao bound can be derived from Kullback’s inequality as the inverse of the second order term in a Taylor expansion. Analogous forms for Kullback’s inequality and the Cram erRao bound for point processes were recently derived using functional methods from quantum field theory. This article develops Kullback’s inequality and the Cramer Rao bound for point process parametrisations as performance bounds for models used in multi-object filtering.


A. Cramér Rao bounds for multi-target tracking applications
The Cramér Rao bound, defined as the inverse of the Fisher information, provides a lower bound on the variance of an unbiased estimator. The need for Cramér Rao type bounds for performance analysis of multi-target tracking systems has been recognised for over 30 years [1]- [3]. The posterior Cramér Rao lower bound applies this concept recursively to a filtering model to determine the best possible achievable variance for an unbiased estimator, i.e. the theoretical optimum. This is now well established in the domain of target tracking for analysing the performance of different tracking algorithms. This has been extended to multi-target tracking, eg. [4]- [7]. The Fisher information was recently studied in the context of multi-target tracking parameters by Houssineau et al [8]. The current article develops a general form of Cramér Rao bound for multi-target performance analysis.
Prior developments of the Cramér Rao bound for multitarget tracking have focussed on determining a covariance matrix for joint distribution describing a set of known targets. While this is potentially useful, it lacks flexibility in some respects: Firstly, it does not provide any inherent localisation with respect to the targets under surveillance. Secondly, it does not take into account the uncertainty in the number of targets in the surveilance region. Thirdly, the trace of the matrix is used in practice, which only provides a single number for the whole multi-target scenario.
The concept of covariance for point processes is a significantly departure from the usual covariance matrix of a vector.
The covariance for a point process describes the covariance between the number of objects in two operator-specified regions. This is a statistic that can be extracted from the multi-target posterior distribution describing the population of targets. A posterior Cramér Rao bound for a general multi-target system that has would then describe the minimum achievable such covariance for an online surveillance application. It will also enable direct comparison of different multi-target trackers.
The bound on the minimum achievable variance for the estimator of a parameter, equal to the inverse of the Fisher information [9], was derived independently by Rao [10] and Cramér [11] is fundamental for statistical analysis. The Cramér Rao bound provides a minimum achievable variance or covariance for a parameter for a univariate or vectorvalued parameter. However, multi-target systems often have parameters that are described by functions and the variance and covariance for point processes are themselves functions with spatial variates. Consequently, the usual formulation of the Cramér Rao bound in these contexts is not applicable for providing a bound for the covariance of the process. Hence, a more general approach is required.
The mean and covariance have been used to describe the posterior distribution of targets since the development of the Kalman filter in the 1960s. An analogous concept for populations of targets came much later [12]. which describes the variance in the number of targets localised to a userspecified subset of the surveillance region. This has been used to compare different filters, and has been proposed to analyse the performance of filters [13].
The covariance of a point process is defined as the covariance between the number of objects in two subsets of the state space. Hence, the usual vector-valued formulation is not directly applicable to point processes. Indeed, the covariance in the number of objects between different regions, as proposed for multi-target tracking analysis [12],is more applicable to large-scale tracking problems, e.g. [14] and sensor management since it provides an aggregate statistic for populations of objects. The Cramér Rao bound for point processes, or multi-target processes, can then be viewed as a minimum achievable covariance for populations of objects in operatordefined surveillance regions.

B. Kullback's inequality and the Cramér Rao bound
The Kullback-Leibler divergence [15], or relative entropy, plays a fundamental role in information statistics for determining the difference between two different probability distributions. Applications of entropy and relative entropy divergence have been applied to point processes [16], [17] since the 1960s [18], though calculation of these concepts in practice can be challenging and there have been a number of different approaches for their determination, eg. [19], [20]- [22] The results in the paper make use of a result from large deviations theory [23]- [25], that a lower bound for relative entropy can be found by taking the Legendre-Fenchel transform of its cumulant generating function, or Cramér's rate function [23], which is known as Kullback's inequality [26]- [29]. A generalisation of this result was derived by Donsker and Varadhan [26].
The Cramér Rao bound bound can be derived [29] by taking the second-order derivative of Kullback's inequality [27] which relates the Kullback-Leibler divergence [15] to to the moment generating function of a random variable. Following this derivation, a form of Cramér Rao bound for point processes and random measures was recently proposed [30]. The approach makes use of a result for the effective action in quantum field theory [31, p289], and makes an analogy between the propagator and inverse propagator to the Fisher information [9] and Cramér Rao bound respectively. The main contribution in the paper is the generalisation of the concept of the Cramér Rao bound [10], [11] to point processes and random measures. The proof follows that of Fuchs and Letta [29] that starts with Kullback's inequality [27] between the Kullback-Leibler divergence and Cramér's rate function [32], and finds the Fisher information [9] as a limit that retains Kullback's inequality. The approach exploits the connection between Cramér's rate functional [30] and the effective action, or generating functional of proper vertices [33, p164]. In particular, it is shown that the second-order term of the generating functional expansion provides a lower bound for a quantity similar to the Fisher information. This is inverted, using the inverse relation of Jona-Lasinio [34] between the second-order functional derivatives of the effective action and the cumulant generating functional, to give the Cramér Rao bound. This provides a bound for the covariance of the process.

C. Paper outline
The paper is structured as follows. Appendix A draws on key results in convex analysis for calculating convex conjugates of functions with the Legendre-Fenchel transform [35]- [37]. These are used to present a new result for the Legendre transform of a mixture function. Appendix B reviews Cramér's rate function for random variables and reviews how to determine this for different parametrised random variables. The next section characterises point processes and random measures with the Laplace functional and cumulant generating functional. The recently derived Cramér rate functional, Kullback's inequality and the Cramér Rao bound for point processes [30] are summarised in Section III. These results are then applied to determine Cramér's rate functional for superpositions and mixtures of point processes in Section IV. Section V determines Cramér's rate functional for parametrised point process and random measure models using functional derivatives to determine the Legendre-Fenchel transform. Section VI presents a detailed examination of Kullback's inequality and the Cramér Rao bound for a particular parametrisation used in multi-target tracking [38]. Appendix C presents tables of Cramer's rate function and functional for different parametrisations.

II. POINT PROCESSES
In this section we describe the fundamental descriptors for point processes and the notion of a covariance. Following Daley and Vere-Jones [39, p.7], define point processes as follows.
Definition II.1 (Point process). A point process ξ λ is a measurable mapping: where E X is the state space of all boundedly finite integer-valued measures, and B(E X ) is the related Borel σ-algebra.
We shall use the Laplace functional to characterise point processes [39, p57] defined as follows.
Definition II.2 (Laplace functional). The Laplace functional of a random measure ξ λ with probability measure λ is defined with (1) where f : X → R + is on the space of non-negative bounded measurable functions of bounded support.
The probability generating functional is related to the Laplace functional with L µ ψ (f ) = G e −f .
The intensity measure [40, p133] A generating functional for the cumulants is defined through the relation W λ (f ) = log L λ (f ). Cumulant densities w (n) λ are defined through the generating functional The second cumulant density of the point process is determined with The covariance of the point process ξ on A × B is given by [39, p69].
The variance is found by setting A = B in the covariance, i.e.
In the following section the Cramér Rao bound for point processes is described in relation to Kullback's inequality for point processes recently derived [30].

III. KULLBACK'S INEQUALITY AND THE CRAMER RAO BOUND FOR POINT PROCESSES
This section summarises recently presented results by the author in [30] that are required to derive Kullback's inequality and the Cramer Rao bound for different point process models.

A. Kullback's inequality for point processes
We now define the rate functional for a point process recently introduced [30] as the following form of Legendre transform.
Definition III.1 (Cramér's rate functional for a point process [30]). Define the Cramér rate functional Γ λ (φ) to be Legendre transform of the cumulant generating functional W(f ) = log L(f ) of a point process.
where f is a non-negative bounded measurable function of bounded support.
Definition III.2 (Kullback Leibler divergence for point processes). Suppose that λ and µ are probability distributions of point processes. Then the Kullback-Leibler divergence [15] from λ and µ is defined as follows when µ is absolutely continuous with respect to λ, and infinite otherwise. Note that 0 ≤ D (µ µ) ≤ +∞, and D (µ µ) = 0 if and only if µ = µ .
Theorem III.1 (Kullback's inequality for point processes [30]). Let λ and µ φ be probability distributions of point processes, where Γ λ is the rate functional for point process with distribution λ, and µ φ is absolutely continuous with respect to λ. Let φ be the intensity function of a point process with probability distribution µ φ . Then the following inequality holds, cf. [41, p38] [29] Example III.1 (Poisson point process). The Laplace functional of a Poisson point process with intensity function µ(x) is defined with [17] where µ(x) is the intensity function. Cramér's rate functional [30] is given by We note that this is the Kullback-Leibler divergence between two Poisson distributions with intensity functions φ(x) and µ(x), eg. [19].

B. The Cramér Rao bound for point processes
Fuchs and Letta [29] derived the Cramér Rao bound as a second-order relation from Kullback's inequality. This proof was recently developed for point processes [30]. Consider the following terms Then it was shown that following inequality holds The inverse of this relation gives a Cramér Rao bound [30]. However, the determination of the inverse is more challenging in a functional context. Jona-Lasinio [34] derived a relation between the second-order derivatives of the cumulant generating functional and the effective action as follows which we use to determine the inverse, To find the inverse, we consider the following The relation becomes Now define the second-order terms, Then the following inequality holds [30] g (2) The covariance of the point process ξ µ ψ parameterised with intensity function ψ is then bounded with the following inequality [30], This result shall be used in Section VI to analyse a particular point process model used in multi-target tracking.

A. Superpositions
The following Lemma show how to determine Cramér's rate functional for superpositions of point processes. A similar result was recently presented by Zajkowski [42] in the context of sums of random variables.
Lemma IV.1 (Legendre-Fenchel transform of a superposition of point processes). Suppose that the distribution is described with a superposition of independent distributions described with characteristic functionals L λ1 (f ), . . . , L λn (f ), i.e.
Then the Legendre-Fenchel transform of the cumulant generating functional W λ (f ) = log L λ (f ) is given by Proof. Note that the cumulant generating function is the sum of the component cumulant generating functions The proof follows from Theorem A.1.
Corollary IV.1 (Minimum Kullback-Leibler divergence for superposition processes). The minimum Kullback-Leibler divergence between the superposition λ of n processes, λ 1 , . . . , λ n , and a process with mean φ is given by the process µ is of the form of a superposition of n processes, i.e.
and the mean is of the form This corollary gives an expression for the minimum Kullback-Leibler divergence for superpositions of n point processes or random measures. In the context of point processes, the rate function takes the first-order moment measure, or intensity measure, as its argument to return a lower bound on the relative entropy.

B. Mixture processes
The Cramér rate function for mixture distributions is a novel result presented in Appendix B, Lemma B.2 based on Theorem A.4, Appendix A. The following Lemma shows the analogous result for a mixture of point processes.
where w λi + · · · + w λi = 1 . Then the Legendre-Fenchel transform of the cumulant generating function is given by = min Corollary IV.2 (Minimum Kullback-Leibler divergence for mixture processes). The minimum Kullback-Leibler divergence between a mixture λ of n processes, λ 1 , . . . , λ n , and a process with mean φ is given by , the term on the left is the Kullback-Leibler divergence between the weights of the mixtures, i.e.
and the process φ is described with a mixture of Laplace functionals, i.e. of the form where w µ1 + · · · + w µn = 1, and the mean of the process is given by This corollary gives an expression for the minimum Kullback-Leibler divergence for mixtures of n point processes or random measures for a given intensity measure.
V. CRAMÉR'S RATE FUNCTIONAL FOR DIFFERENT PARAMETRISATIONS This section derives the rate functional for some common parametrisations.
Example V.1 (Bernoulli point process). The Bernoulli point process extends the Bernoulli distribution to the case with intensity ψ(x), eg. [39, p530]. The Laplace functional of a Bernoulli process is given by, eg. [19], Cramér's rate function is given by Proof. The cumulant generating functional is then The Legendre-Fenchel transform then becomes As above, we take the functional derivative to give = e −f (y) ψ(y) which gives Rearranging for −f (y), we have Substituting (37) into the definition of the Laplace functional, we get and thus which, substituting into (38), we get −f (y) = log φ(y) ψ(y) + log 1 − ψ(y)dy 1 − φ(y)dy .
Finally, substitution of −f into (35), we get the result.
Example V.2 (Panjer point process). The Laplace functional of a Panjer point process [44] is given by Cramér's rate function is given by dx.
Example V.3 (Binomial point process). Following the extension of a Bernoulli distribution to a binomial distribution, the Laplace functional of a binomial point process is given by Cramér's rate functional is given by Proof. The cumulant generating functional becomes The result follows by using and the result for the Bernoulli point process and the scaling property for Legendre transforms, i.e. (nΛ) Example V.4 (Poisson-binomial process). Consider the Laplace functional of the Poisson-binomial process, which is composed of n Bernoulli processes [45], i.e.
Cramér's rate function is given by Proof. The proof follows from Lemma V.1 and Theorem A.1.

A. Analysis of the Poisson multi-Bernoulli mixture process
In this section we analyse a particular model used for multi-target tracking applications, known as the Poisson multi-Bernoulli mixture process introduced by Williams [38] which is a mixture process where each term in the mixture is the superposition of a Poisson process and a Poisson-binomial process. This is chosen since it involves both superpositions and a mixture, hence we can demonstrate the application of the results introduced in the previous sections. The probability generating functional is of the following form, where each G respectively and the following probability generating functionals.
The related Laplace functionals have the same structure, i.e.
The cumulant generating functional is then given by It is straightforward to show that the intensity of the whole process is given by The following lemma computes the Cramér rate functional for this model.
Lemma VI.1 (Cramér rate functional for Poisson multi--Bernoulli mixure process). The rate functional for the process described above is of the form where the term on the right is given by whose terms are calculated with the rate functionals for the Poisson and Bernoulli processes, i.e.
Proof. The proof follows from first applying Lemma IV.2 to determine the rate functional for the mixture process, then Lemma IV.1 for the superposition in each mixture component, and finally we use the rate functionals for the Poisson and Bernoulli processes in Examples III.1 and V.1 respectively.
The following Lemma shows that the Kullback-Leibler divergence of two Poisson multi-Bernoulli processes with the same structure is the same form as the rate functional.
Lemma VI.2 (Kullback-Leibler divergence between two Poisson multi-Bernoulli mixure processes of the same structure). The probability generating functional is of the following form, where G (j) The Kullback-Leibler divergence between this process and the previously defined Poisson multi-Bernoulli process is given by where Γ (j) is defined in Lemma VI.1, i.e.
where the term on the left is the Kullback-Leibler divergence between the weights of the mixtures. Each component j is composed of the superposition of an independent Poisson process and independent Bernoulli processes, hence the divergence is additive. Finally note that the divergence between two Poisson processes or Bernoulli processes are equal to their respective rate functionals.
The following lemma determines the covariance of the process.
where the j th covariance and intensities are given by where the covariance and intensities of the individual components are found with cov(ξ Proof. The first equality for the covariance follows from the law of total covariance. The calculation of cov (j) (ξ(A), ξ(B)) and µ (j) ((A), µ (j) (B) follow from the independence (due to superposition) of the terms for each j. The terms µ are the mean and covariance of a Bernoulli process [13].
In the following Lemma we shall consider the case in Lemma VI.2 where α (j) = w (j) , since it relates to the particular model in the Gaussian prediction model in the following section. When this is not the case, we have to invert (67), which will involve a minimisation over w (j) .
Lemma VI.4 (The Cramér Rao bound for a particular Poisson multi-Bernoulli mixure process). Consider the case in VI.2 where α (j) = w (j) , so that the mixture weights are equal. The Cramér Rao bound is given by Proof. Since α (j) = w (j) , we have Using the scaling property of Legendre transforms (aΛ) * (φ) = aΛ * φ a , from Theorem A.1, we have From the law of total differentiation, we have

Hence the Cramér Rao bound is given by
In the previous Lemma, it is interesting to note that the Cramér Rao bound is lower than the covariance of the process calculated in Lemma VI.1. This is due to the fact that the hypothesis weights do not change in the prediction model.

B. Gaussian prediction model
Let us consider a scenario where the posterior probability generating functional is of the form of a Poisson multi-Bernoulli mixure as described in subsection A. Suppose that each target evolves according to the branching model where each target evolves according to Bernoulli survival model where p s is the probability of survival and f k|k−1 (x|y) = N (x; Fy, Q) is a Gaussian Markov transition.
Suppose that the intensities are Gaussian mixtures of the forms (c.f. [38]).
Then the Bernoulli and Poisson rate functionals Γ Proof. Consider the first term in the rate functional for the Bernoulli process φ b (x). Then we can separate the parts related to the constant and to the Gaussian due to the additivity of the logarithm, i.e.
The result comes from the Kullback-Leibler divergence for Gaussians [47, p189]. The result for the Poisson case uses the chain rule for Kullback-Leibler divergence.
Corollary VI.1 (The variance and the Cramér Rao bound). Following Lemmas VI.3 and VI.4, the variance for the terms in this model over the state space X become var(ξ ).
The calculations above relate to the Kullback-Leibler divergence and the Cramér Rao a particular model for the multi-target prediction where it is possible to calculate the expressions analytically. This approach is also applicable to multi-target scenarios where there is control in the population of targets [48]. The possibility of target births was omitted, though the variance of the birth model could be added to the expression for Cramér Rao bound to deal with this scenario.
To consider the update model, we could either consider calculating the expressions, e.g. the Kullback-Leibler divergence, for a particular measurement set or average over all possible measurement sets. The former case brings complications since the structure of the posterior is not generally the same as the prior since there are additional terms introduced due to the different data association configurations. In the latter case, we can determine the mutual information which is the expectation of the Kullback-Leibler divergence between the prior and the posterior. See [21]. for a systematic approach for calculating the mutual information in multi-sensor scenarios.

A. Fundamentals
The Legendre transform [49] is an involutive transformation on real-valued convex functions. It is used in physics to relate the Hamiltonian to the Lagrangian in thermodynamics. Fenchel proposed a generalisation of the Legendre transform to non-convex functions, known as the Legendre-Fenchel transform or convex conjugate [50], which is a fundamental result in convex analysis. The Legendre-Fenchel transform is used in large deviations theory [51]- [53] based on Crámer's rate function [23]- [25], [32], to calculate the probability of rare events.
This function is useful when considering composite functions, as shown in the note by Hiriart-Urruty [35]. In particular, we shall make use of the following theorem.
B. The Legendre-Fenchel transform for the logarithm of mixtures of functions Random variables are often defined in terms of weighted mixtures of moment generating functions. Hence, their cumulant generating function is defined by the logarithm of this weighted mixture. We can use the operations discussed above to determine the Legendre transform with the following theorem.
Theorem A.4 (Legendre transform of the logarithm of a mixture function). Let g : R n → R be defined as follows.
Hence, using the Legendre transform of the log-sum-exp function in Theorem A.2 (from [37, p482]) and Theorem A.3 (Corollary 4 from [35]), and the Legendre-Fenchel transform of the sum of a function and a constant, the result follows. Specifically, suppose that c is a constant. Then then Legendre-Fenchel transform of the sum of a function f with the constant c is given by Then we have (log g) * (φ) = (θ • (Λ 1 + log w 1 , . . . , Λ n + log w n )) * (φ).
Note that the consituent mixture components Λ i are typically described on the same space. This can be determined by setting all of the arguments to be the same, i.e. y 1 = · · · = y n .
Though this is a direct application of Corollary 4 from [35], the author believes that this is the first statement of this particular result for mixture distributions.

APPENDIX B CRÁMER'S RATE FUNCTION FOR RANDOM VARIABLES
In this section, we review Cramér's rate function for random variables and how to determine the Legendre transform with some simple parametrisations.
A. The cumulant generating functions and the Legendre-Fenchel transform Definition B.1 (Moment generating function). The moment generating function of random variable ξ is given by When ξ is a non-negative integer-valued random variable, the moment generating function becomes and when ξ is a continuous-valued random variable, the moment generating function becomes M (z) = e zξ p(ξ)dξ.
B. Cramér's rate function for sums of random variables The following Lemma was recently presented by Zajkowski [42] for determining Cramér's rate function for sums of random variables. This uses Theorem A.1, the result of Hiriart-Urruty for calculating the Legendre-Fenchel transform of a summation [35]. We first present the Lemma, and then illustrate it for a Poisson-binomial distribution [45], which is formed with the sum of independent Bernoulli distributions.
Lemma B.1 (Cramér's rate function for sums of random variables). Suppose that the distribution is described with a superposition of independent distributions described with moment generating functions M 1 (z), . . . , M n (z), i.e.
Then the Legendre-Fenchel transform of the cumulant generating function Λ(z) = log M (z) is given by where Λ * i (φ i ) is the Legendre-Fenchel transform of the i th cumulant generating function.
C. Cramér's rate function for mixture distributions The following Lemma determines Cramér's rate function for arbitrary mixtures of random variables. This provides a lower bound on the Kullback-Leibler divergence from a mixture distribution to a distribution of a given mean.
Lemma B.2 (Legendre-Fenchel transform of a mixture of distributions). Suppose that the distribution is described with a mixture of moment generating functions M 1 (z), . . . , M n (z), i.e.
where Λ * i is the Legendre-Fenchel transform of cumulant generating function Λ i (z) = log M i (z).
Proof. Note that Λ(z) = log n i=1 w i e Λi(z) and that w i M i (z) = exp(Λ i (z) + log w i ). The proof follows from Theorem A.4.