Adaptive Kalman Filter for Linear Systems with Additive and Multiplicative Noises

—In this paper, a novel adaptive Kalman ﬁlter is proposed to solve the adaptive Kalman ﬁltering problem for linear systems with dynamic multiplicative and additive noises. The noise covariances are considered unknown and a varaitional Bayesian (VB) approach is proposed. We use inverse Wishart priors for noise covariances and Student’s t-distribution to represent the likelihood functions. Estimation algorithms for recursive noise covariance matrices and the dynamic state are proposed following VB process. A target tracking example is provided to validate the effectiveness of the proposed algorithm.


I. INTRODUCTION
The Kalman filter is an optimal state estimator in the sense of minimum variance for linear systems with Gaussian measurement and process noises [1]. It is widely used in communication, signal processing and aerospace engineering due to its real-time, optimal, and stable characteristics. The usefulness of the Kalman filter is limited by priori information: the noise statistics and system model parameters. Also, the statistics property of the noises are usually assumed to be known and time-invariant. The unknown priori information may lead to lose efficiency. On the other hand, in many practical applications, e.g., navigation and positioning systems [2]- [5], the prior information on noise statistics and model parameters may be unknown, part-known and even timevarying and missing. The classical methods to solve such a problem are so-called the adaptive Kalman filter, and can be further classified by four branches including maximum likelihood, Bayesian, covariance matching and correlation approaches [2]. The typical implements of adaptive Kalman filters include innovation-based adaptive filter [6], interacting multiple model filter [6], [7] and the Sage-Husa adaptive filter [8]. In particular, innovation-based adaptive filter uses the innovation sequence and a maximum likelihood criterion to estimate the noise covariance matrices. Interacting multiple model filter is a Bayesian approach where other methods can be seen as its approximations. Sage-Husa adaptive filter uses the maximum-a-posterior criterion to estimate the noise statistics recursively and it is a covariance matching method.
This class of adaptive Kalman filter may suffer from issues of the convergence, practical applications, and computation burden [9], [10].
More recently, variational Bayesian (VB) approach has received much attention since it can be used to perform approximate posterior inference and to estimate uncertain hidden parameters or state variables [11], [12] in broad application including machine learning, visual tracking, signal processing and etc [13]- [16]. In fact, the existing VB approaches are also seen as an approximation of the Bayesian inference, which are proposed to estimate the additive measurement noise covariance matrix (AMNCM) by using an appropriate conjugate prior probability density function (PDF) or distribution [9], [17], [18] . Due to its strong adaptability, a large number of research results are obtained. In particular, a theoretical framework for VB-based filter is presented in [11], [19] and a variant of filters are proposed including VBbased extended Kalman filter [20], unscented Kalman filter [21], cubature information filter [22] and particle filter [23], [24]. Existing filters usually model additive Gaussian process and measurement noises with Student's t-distribution in a heavy-tailed form and model AMNCM or additive process noise covariance matrix (APNCM) with Wishart distribution, inverse Wishart distribution or inverse Gamma distribution. However, to the best of our knowledge, the study on the multiplicative noise problem under a VB approach is still open and how to design an adaptive Kalman filter is a challenge. Motivated by the fact that multiplicative nose can be used to describe a wider class of physical systems, we will address the estimation problem for systems with unknown multiplicative measurement noise covariance matrix (MMNCM) using a VB approach.
Compared with additive noise, the mean of multiplicative noise is always non-zero and this results in additional difficulties for state signals [25], [26], [26]- [29]. In engineering practice, the mean and convariance of the multiplicative noise are usually difficult to be obtained. In this paper, we present a recursive adaptive Kalman filter to estimate the unknown noise covariances of multiplicative and additive noises (including MMNCM, AMNCM and APNCM). First, we use inverse Wishart priors for process noise covariance and observation noise covariance, and the one-step predicted error covariance and choose Gaussian priors or non-Gaussian Student's t priors for likelihood functions. Then, the dynamic states together with noise covariances are introduced in a recursive way based on a VB-based approach. For comparison, the proposed filter and existing stat-of-the-art filters are used to a target tracking example with unknown noise covariances. Simulation results demonstrate the superiority of the proposed filter than these existing filters.
The main contributions of this paper are two-fold. First, compared with the filter for linear systems with additive noises, we propose an algorithm that is applicable to address both additive and multiplicative noises. To solve the technical issue raised by the multiplicative noise, a variational Bayesian approach is proposed. Moreover, the proposed approach is able to approach the multiplicative and additive measurement noise covariances as a whole, while the algorithms proposed in [25], [27], [28] often operate in a separate way. Second, in contrast with [25], [27], [28], where the covariance of the multiplicative noise is assumed to be fixed and known, we focus on the situation that the covariances of both additive and multiplicative noises are time-varying and unknown. Towards this end, a novel adaptive Kalman filter is presented to jointly approach the covariances of multiplicative and additive noises based on projection formula and a VB-based approach.
The reminder parts of this paper are organized as follows. Section 1 introduces the background. Sections II and III state the considered problem and the proposed algorithms respectively. Then, Section IV evaluates the performance analysis and Section V gives a numerical simulation example. Finally, Section VI concludes the paper.
Notation. E[·] and log[·] denote the expectation operation and the logarithmic function, respectively. tr(·) denotes trace operation. (p q) denotes the distance between p and q. · denotes 2-Euclid-norm of a matrix or vector. | · | denotes determinant of a matrix or vector.

II. PROBLEM FORMULATION
We consider the following discrete-time linear state-space model where k is the time index, X k ∈ R n is the state vector, Y k ∈ R m is the measured vector, W k ∈ R n and V k ∈ R m are mutually uncorrelated zero-mean Gaussian additive noises with covariance matrices Q k and R k , F k ∈ R n×n and H k ∈ R m×n are the state transition matrix and the observation equation matrix, respectively, and m k ∈ R is the Gaussian multiplicative noise with meanm k and covariance σ k . Assume the initial state X 0 be Gaussian with meanX 0|0 and covariance P 0|0 . Moreover,m k ,X 0|0 and P 0|0 are given. The standard Kalman filter employs measurements z 1:k to estimate the state in measurement update step. The Kalman filter is optimal for systems without multiplicative noise, while Kalman filter for model (1)-(2) is optimal in terms of linear minimum variance with given Q k , R k and σ k . However, in general situation, Q k , R k and σ k are unknown or inaccurate. Hence a novel VB-based filter is proposed in this paper that is suitable for unknown covariances of both multiplicative noise and additive noise.
Related work. In [30], [31], multiplicative noise m k is considered as a Gaussian process with known mean and covariance matrices. In [27], [28], [32], m k is also considered as Gaussian process but time-correlated. In our previous work [25], [33], we use the Monte Carlo sampling and variatonal approach to solve the issue of unknown and non-Gaussian multiplicative noise. However, the proposed method is approximation one with enough number of particles and the guarantee of sampling accuracy. The existence of multiplicative noise will lead to the complexity of sampling, and then lead to inaccurate estimation. Therefore, this paper will adopt a more accurate iterative method. For simplicity, we take Gaussian distribution example and the cases for other fixed distributed can be similarly solved.
Before proceeding on, we first introduce several definitions. Definition 1: According to Mellin-Barnes integral representation in the complex plane, Meijer G-function is defined as: where a i and b i are real or complex parameters; m (0 ≤ m ≤ q) and n (0 ≤ n ≤ p) are integers; L is the integration path; i denotes the complex plane; Γ denotes the gamma function and is defined as:

Definition 2:
The Student's t-distribution of variable t is defined as: where Γ is the gamma function and ν is the number of degrees of freedom (dof). Definition 3: The Cauchy distribution of variable x is defined as: where x 0 is the location parameter and γ is the scale parameter. Definition 4: For a d×d symmetric positive definite random matrix B, the inverse Wishart distribution is defined as: where λ is the dof parameter, Ψ is a d × d dimensional symmetric positive definite inverse scale matrix, and Γ d (·) is the d-variate gamma function [34]. Definition 5: For discrete probability distributions p and q defined on the same probability space X , the KullbackLeibler divergence (KLD) from q to p is defined to be: Consider the linear system (1) and (2). The objective of this paper is to calculate state x k together with Q k , R k , and σ k using a iterative approach. In other words, the objective is to compute the joint posterior distribution p(x k , Q k , σ k , R k |y 1:k ).

III. PROPOSED ALGORITHMS
This section will propose the filters from the two aspects: one-step prediction and measurement update. The measurement update step contain choices of prior distributions, choice of the likelihood function, choices of covariance matrices and variational approximations of posterior PDFs.

A. One-step prediction
Since the multiplicative noise doesn't affect the state equation (1), so the one-step prediction p(x k |z 1:k−1 ) is Gaussian: wherex k|k−1 and P k|k−1 are respectively the one-step predicted state vector and corresponding one-step predicted error covariance matrix and N(·; µ, Σ) denotes the Gaussian distribution with mean vector µ and covariance matrix Σ. Then, it follows that x k−1|k−1 and P k−1|k−1 denote the k − 1 filtering state and corresponding estimation error covariance matrix, respectively. Since Q k−1 is unknown and is a linear correlation with P k|k−1 . So we choose to estimate P k|k−1 instead of Q k−1 . Remark 1: From (5), P k|k−1 is a linear function of Q k−1 and therefore estimation of Q k−1 can be replaced by the estimation of P k|k−1 .
Remark 2: The one-step prediction is assumed to be Gaussian no matter with or without multiplicative noise [28]. For other distributions, for example, in the view of Student's tlikelihood function described in the following, (3) can be updated by x k|k−1 and P k|k−1 can be obtained by Student's t-distribution ((4)-(5) is the special case).

B. Measurement update
Since the presence of multiplicative noise in measurement equation affect the distribution of likelihood function and then affect the derivation of measurement update step, we will first discuss the choice of the likelihood, then discuss the approximation of noise covariances, and finally induce the estimation results in the following.
1) Characterization and selection of likelihood function: We first consider the likelihood PDF p(z k |x k ). From (2), we can see that m k changes the distribution of p(z k |x k ), because the PDFs of products of independent random Gaussian variables are not necessarily Gaussian functions, but shown to be Meijer G-functions in general [35], [36]. This means that the PDF of m k ×x k in (2) is a Meijer G-function, and therefore the likelihood p(z k |x k ) is non-Gaussian.
Remark 3: The convolution and the product of Gaussian probability density functions (PDFs) are also Gaussian functions, while this fact is limited to different PDFs of the same random variable [37].
The integrations in Definition 1 for the general solution are usually not analytically tractable. Also, an explicit solution for the shape parameters (a i , b j ) of Meijer G-function cannot be obtained analytically and the majority of the established special distributions can be used to represent the Meijer Gfunction. On the other hand, it is well known that an infinite mixture of Gaussian functions can approach arbitrary distribution [38]. According to Definition 2, a Student's t-distribution can be obtained by an infinite number of Gaussian mixture and therefore can be used to represent arbitrary distribution since an infinite mixture of Gaussian functions can deduce arbitrary distribution (approximation of a Meijer G-function) [38]: Here, µ is the mean, λ is the covariance, and ν is the dof. In particular, when ν = 1, the Student's t-distribution reduces to the Cauchy distribution (Definition 3). In addition, and as ν → ∞, it becomes a Gaussian distribution N(x; µ, λ −1 ) with mean µ and covariance λ.
Then, we use a Student's t-distribution to formulate the likelihood PDF p(y k |x k ) [39]: where G(·; α, β) is the Gamma function (Definition 1) with an auxiliary stochastic variable λ k , the shape parameter α and the rate parameter β.
Then, µ and λ in (6) for likelihood (2) are organized as (7), p (y k |x k ) can be reformulated as the hierarchical Gaussian distribution form: From (8), we notice that the auxiliary random variable λ k adjusts the likelihood PDF of p(z k |x k ). When λ k = 1, (8) is simplified into Gaussian likelihood: where This means that p(z k |x k ) is always a Gaussian cluster or Gaussian truncated λ-PDF [40].
2) Choices of Covariance Matrices : Note that objective is to estimate state x k together with P k|k−1 , R k and σ k , where R k and σ k are AMNCM and MMNCM, respectively. Generally speaking, there are two classes of solutions including direct method and indirect one. The direct method is to compute multiplicative and additive measurement noise covariance matrices p(σ k , R k |y 1:k ) separately while indirect one is to compute p(σ k , R k |y 1:k ) as a whole measurement noise covariance matrix p(R M k |y 1:k ). Without loss of generality, we only introduce the indirect method.
In engineering practice, as shown in (2), we cannot distinguish between additive noise and multiplicative noise, since both are regarded as observation noise. Therefore, we choose to estimate them as a whole with a unified covariance matrix R M k . We next specify how to infer state x k together with P k|k−1 , R M k . In order to guarantee that the posterior PDF is consistent with the prior PDF, we need to select the conjugate prior distributions with inaccurate PECM P k|k−1 , MMNCM σ k and AMNCM R k . In Bayesian analysis [41], we usually choose the inverse Wishart distribution as the conjugate prior for covariance matrix [34].
Remark 4: Both inverse Gamma (IG) distribution and inverse Wishart (IW) distribution can be chosen as the conjugate prior for the covariance matrix [5], [9]. IG is the special case of IW.
According to [5], the priorô k|k−1 can be chosen as: where n is the dimension of states and τ 0 is a tuning parameter. To get the prior P k|k−1 , P k|k−1 should be set as the nominal one-step prediction of error covariance matrix (PECM)P k|k−1 whereQ k−1 is the nominal APNCM. Then, For R M k , we choose [5]: where m is the dimension of measurements and ρ ∈ (0 1] denotes a time-fluctuations forgetting factor. The initial value isÛ 3) Variational Approximations of Posterior PDFs: Our objective here is to obtain the joint posterior distribution p(x k , P k|k−1 , R M k |y 1:k ). Since the solution for this joint posterior update is not analytically tractable, motivated by [12,19,38] a VB-based approach ( [12], [19], [38]) is used to obtain an approximate distribution for p(x k , P k|k−1 , R M k |y 1:k ) as follows: where q(·) is the approximation distribution of p(·). q(·) can be calculated by by minimizing the KLD [12], [19], [38] as where KLD[·]is defined in Definition 5. According to [12], the solution for (18) is : where, θ is one element of Ω, and −θ represents the complementary set of θ in Ω, and C θ denotes the constant. Since (19) cannot be solved directly, we apply the fixed-point iteration method to solve it, where q(θ) is updated after the (i + 1)th iteration step result q i+1 (θ) [12], [19], [38]. We next specify likelihood function (8) to obtain the proposed filter (Algorithm 1 in Appendix), and then specify likelihood function as its special Gaussian form (9) to obtain another proposed filter (Algorithm 2 in Appendix).
We also consider a special likelihood function p (y k |x k ) when (9) is Gaussian (ν → ∞). The algorithm can be obtained similarly to that of non-Gaussian case and then organized as Algorithm 2 in Appendix.

IV. PERFORMANCE ANALYSIS
We analyze the stability of the proposed filter because the effect of the parameters τ , ρ,Q k andR M 0 . First, we discuss the influence of τ on the proposed filter. Substituting (39) (48) According to (12), (14) and (29)- (30), (48) can be rewritten asP From (49), we know that τ is a tradeoff parameter to balancẽ P k|k−1 and A (i) k . The larger τ corresponds to much prior uncertainties about the nominal APNCM, and leads to worse performance of filter. The smaller τ means that much information is lost, thus it also leads the worse accuracy of filter. We have experimental and observational evidence concerning the choice, τ can be within the range τ ∈ [10, 800], since this scope can accommodate the effects of multiplicative noise.
Third, we discuss the effect ofQ k andR M 0 . The initial values are Set ζ k = η(k,ρ) η(k,ρ)+1 , C k = B (N −1) k η(k,ρ)+1 , where 0 < ζ(k) < 1 and C k ≥ 0, (55) can be rewritten aŝ From (58) and (60), we know that the initial APNCM . When k increases,R M 0 proportionally affects onR M k . To ensure thatP since the VB inference can only guarantee local convergence. Therefore, these parameters need to be selected near the true value and we rely on the experimental experience, that we selectsQ k and Finally, we discuss the influence of initialQ k andR M 0 on stability of the proposed filter. Based on (58) and (60),Q k > 0,R M 0 > 0, 0 < ζ(k) < 1 and C k ≥ 0, we obtaiñ Note that (28) and (32) yields Then, substituting (61)-(62) into (48) and (55) yieldŝ This means that PECMp are positive definite and indicates that the proposed filter is stable.
Remark 6: The higher the number of iterations N , the more run time will need and the more estimation accuracy. We need to balance precision and time complexity according to the dimensions of noises. Different from system with additive noises, we suggest that system with multiplicative noise should choose a larger N to guarantee that algorithms converge to a local optimum. Meanwhile, the parameters discussed above have wider selection range and more flexible selection degree as compared with system with additive noises, since the corresponding system changes more dramatically.

V. SIMULATIONS
In this section, we verify the proposed adaptive filter by a target tracking example. In 2-D space, the target moves with a constant velocity and we can measure its position. Process equation and observation equation are where the state has four dimension x k = [x k y kẋkẏk ] T ; Components x k , y k ,ẋ k andẏ k denote the positions and corresponding velocities.
The parameters are set as where ∆t = 1s is the sampling interval and I 2 is a 2 × 2 identity matrix. Similar to [5], [42], the true additive process noise covariance is slowly time-varying and additive observation noise covariance is alos slowly time-varying, which are given by and nominal covariance matrices (Q k andR M 0 ) (denoted as KFNCM/MKFNCM) and Kalman filter without/with multiplicative noise and true covariance matrices (Q k and R M k ) (denoted as KFTCM/MKFTCM) [28], the existing VBAKF [5], the existing particle filter (PF) [43], VBPF [33], and the proposed VB-based filter (λ k in Algorithm 1 is set to be 1) are tested. For existing KFNCM/MKFTCM and VBAKF and the presented filter, we set ρ = 1 − exp(4), τ = 200, α = 1, β = 100, and N = 100. The simulation is performed in a computer with Intel Core i7-5600 CPU at 2.60 GHz and MATLAB R2019a software.
To evaluate the estimation accuracy of the state and the predicted error covariance matrix (PECM) and measurement noise covariance matrix (MNCM), the RMSEs of positions and velocities and the square roots of normalized Frobenius norm (SRNFN) are chosen as performance metrics and are defined by stable RMSEs than those of KFNCM, VBAKF, and KFTCM. It can be also seen from the figures, the existing filters diverge eventually, while the proposed filter has robust convergence. We can see from the figures, that although the multiplicative disturbance enlarges or weakens the true signal, it cannot affect the proposed filter but has a huge impact on the existing filters. Fig. 7 demonstrates the RMSEs of positions and velocities from the existing filters and the proposed filters when τ = 2, 10, 100, 200, 600, 800. As we can see from Fig. 7, the presented adaptive filter has wider tuning range than existing filters (all existing filters diverge eventually) do and has better estimation and essentially consistent estimation performance than existing filters do. Fig. 8 shows results of the RMSEs when ρ= 0.4, 0.6, 0.8, 0.9, 0.98, 1. From Fig. 8, the proposed adaptive filter is more flexible and has wider selection range within ρ ∈ [0.4, 1] than existing filters within ρ ∈ [0.9, 1] and has higher estimation accuracy and essentially consistent estimation performance than those existing filters, which diverge eventually. Fig. 9 draws the RMSEs from the proposed filters, the existing particle filter (PF) [43] and VBPF [33] with 1000 particles, and τ = 400, ρ = 0.98. We can an easily get that the presented adaptive filter can estimate the adaptive noise covariance and can also achieve better performance than these state-of-the-art nonlinear filters when system are affected by multiplicative noise.

VI. CONCLUSIONS
In this paper, we mainly focus on estimation problem of linear state-space systems with unknown additive and multiplicative noise covariances. A novel VB-based adaptive filter is proposed, where the states together with noise covariances were estimated, approached and inferred by choosing inverse Wishart priors as the unknown covariance and Student'st distribution as the likelihood function. Simulation results illustrate that the proposed VB-based adaptive filter is more robust to eliminate strong noise (multiplicative) effects and to resist the uncertainties and disturbance with unknown noise covariances.