L2 and L1Trend Filtering: A Kalman Filter Approach a

$\ell_2$ and $\ell_1$ trend filtering are two of the most popular denoising algorithms that are widely used in science, engineering, and statistical signal and image processing applications. They are typically treated as separate entities, with the former as a linear time invariant (LTI) filter which is commonly used for smoothing the noisy data and detrending the time-series signals while the latter is a nonlinear filtering method suited for the estimation of piecewise-polynomial signals (\eg, piecewise-constant, piecewise-linear, piecewise-quadratic and \etc) observed in additive white Gaussian noise. In this article, we propose a Kalman filtering approach to design and implement $\ell_2$ and $\ell_1$ trend filtering % (QV and TV regularization) with the aim of teaching these two approaches and explaining their differences and similarities. Hopefully the framework presented in this article will provide a straightforward and unifying platform for understanding the basis of these two approaches. In addition, the material may be useful in lecture courses in signal and image processing, or indeed, it could be useful to introduce our colleagues in signal processing to the application of Kalman filtering in the design of $\ell_2$ and $\ell_1$ trend filtering.


Relevance
Trend filtering is the process of smoothing a time series by filtering out the noise.Assuming that a time series y k , k = 1, • • • , L, consists of an underlying slowly time varying trend x k and a more rapidly varying random component v k (i.e., y k = x k + v k ), the goal of trend filtering is to reconstruct the trend component x k or, equivalently, estimate and remove the random component v k = y k − x k from the time series y k .It can be considered as an optimization problem with the objective of maximizing the smoothness of x k and minimizing the residual error between the actual and smooth series.The process of estimating x k is known as "filtering" or "smoothing" in some contexts.The term "filtering" is used in analogy with trend filtering.It is actually a batch method for estimating the trend component taking all measurements (past, present and future samples) into account, so implemented non-causally in the literature.
Trend filtering has a broad range of applications in science, engineering, statistical signal and image processing, financial time series analysis, geophysics, and biological and medical sciences.Among many trend filtering methods (e.g., moving average filtering, band-pass filtering, smoothing splines, Gaussian process regression, and many other popular methods), 2 trend filtering and its variation 1 trend filtering provide significant advances for estimating signals that exhibit varying degrees of smoothness across the input domain [1,2].In the following, a brief discussion on these two methods is given.

trend filtering
The 2 trend filtering also known as quadratic variation (QV) regularization is a linear time invariant (LTI) filter which is commonly used for smoothing the noisy data and detrending the time-series signals.There are a lot of connections in the literature to 2 trend filtering.Some of them are QV regularization and smoothness priors; the ill-posed inverse problems and problems of statistical Tikhonov regularization; the problem of estimating a smooth trend embedded in white noise addressed by Whittaker and Bohlmann's work who studied the application of regularized method for time series smoothing; Hodrick-Prescott (H-P) filtering also known as Hodrick-Prescott decomposition to remove the cyclical component from time-series data and estimate the trend signal (see [3,4,5] and the references therein).
, where x k and xk are respectively the original and estimated signal [3]) is also reported.The results show that 2 trend filtering is able to effectively recover the desired signal.Note that the original signal is highly smooth.Therefore, the 2 trend estimate becomes close to the original signal as the order of the method increases.The reason is that 2 trend filtering has a strong connection to LTI filtering and smoothing.Specially, it is a smoothing method defined by a problem of optimization with a smoothing constraint, in which the weight controlling the constraint is directly related to cutoff frequencies and the smoothing order is directly controlled with the order of derivatives [3,4,5].Therefore, it is able to effectively separate the slow time varying signal and the rapidly time varying noise.However, in some signal processing applications, the signals are more complex.For instance, consider a situation in which a discrete event phenomenon is observed in the presence of a smooth (polynomial) signal (i.e., the signal derivative is discontinuous).Such signals are known as sparse-derivative (step or piecewise-polynomial) signal.In such cases, the time scale for changes in the noise and the signal are similar, so the signal and the noise overlap in frequency domain considerably.As a result, for such signals, 2 trend filtering has some limitations that make it inefficient to reconstruct the signal of interest.In other word, 2 trend filtering is unable to separate signal and noises that overlap in frequency domain.As an example, a piecewiseconstant signal and its noisy signal are depicted in Figure 2(a) and 2(d), respectively.Such noisy step-like time series is ob- served in molecular machines which are corrupted by molecular noise [6].We need to remove the noise and reconstruct the desired time series.The signal obtained using the first order of absolute values (i.e., an 1 -norm) for the sum of squares used in 2 trend filtering to penalize variations in the estimated trend [7] and later popularized by Kim et al. as a variation of H-P filtering in applied mathematics literature [2].The term TV ( 1 -norm) was used as a regularization approach in a wide range of applications in image processing, including image restoration, denoising, deconvolution, interpolation and compressed sensing, etc.The 1 trend filtering is similar to 2 trend filtering but produces a piecewise-polynomial estimate of the trend makes it suitable for analyzing the time series with an underlying piecewise-polynomial trend.The 1 trend filtering is challenging due to the non-differentiability of the 1 -norm while 2 trend filtering is not since the 2 -norm is differentiable.
In this lecture note, we present a unified framework based on Kalman filter (KF) and Kalman smoother (KS) to implement these two approaches.The proposed framework provides a unifying platform for understanding the basis of these two approaches and explaining their differences and similarities.It is shown that the 2 trend filtering is a special case of the 1 trend filtering.As mentioned before, the 2 and 1 trend filtering approaches are implemented non-causally in the literature.Based on the proposed framework, it is possible to causally implement these trend filters: while KS can be used to implement the traditional non-causal trend filters, the KF can be used to convert them to a causal filter design scheme.

Prerequisites
This lecture note requires basic knowledge of system theory, state-space model, estimation theory and Kalman filter.

Problem Statement
The problem is to estimate an unknown signal x k from the observation y k in the model: where v k is the unwanted additive noise signal, which we can use the different 2 trend filtering (QV regularization) or 1 trend filtering (TV regularization) for rejecting the noise and preserving the smoothness of the estimated x k , by solving the following optimization problems [3,7] at2 2 trend filtering: argmin 1 trend filtering: argmin for all k = 1, • • • , L, where the regularization parameter λ controls the degree of smoothing, ∇x j = x j − x j−1 is the first order difference and ∇ n x j = ∇(∇ n−1 )x j is the n-th order difference which is precisely defined by For instance, the first and second order difference are defined as Consequently the first and second order 2 trend filtering are respectively defined as while in 1 trend filtering, they are defined as In the derivation that follows, we consider the problem of signal smoothing using 2 and 1 trend filtering.We derive a KF model for implementing these two approaches and pointing out their differences and similarities.

Solution
In this section, we derive a KF model for 2 trend filtering and 1 trend filtering, respectively.First, we derive a dynamic model in state-space form to represent each approach and combine it with the Kalman filtering framework to estimate the desired signal.

2 trend filtering
The unconstrained optimization problem (2a) can be represented in the form of the optimal Wiener smoothing filter.
See [4, section 4] for more details.The linear state-space model corresponding to it can be represented as [4, equation ] where v k is known as the observation noise, w k is an additive zero-mean random term and known as the process (model) noise.The optimal value of λ in (2a) is related to the process noise power σ 2 w and the observation noise power σ 2 v (see [4, equation (20)] and [8, equation (30)]): In other word, the effect of process noise power and observation noise power are hidden in the regularization parameter.Therefore, constrained optimization parameter, λ, plays the role of balancing the optimization between the two noise terms power.In other word, λ can be chosen based on the ratio of variance of the two noises.As mentioned before, the 2 trend filtering is a low-pass smoothing filter and the regularization parameter is related to the cutoff frequency [3, equation (23)]: So, for extracting a signal within a predetermined frequency band, the regularization parameter can be chosen based on the value of cutoff frequency (8).It is more convenient to express (6) as where α i = (−1) i+1 n i .In order to estimate the smooth trend signal x k using KF structure, we need to consider the canonical observable representation of (9).A standard dynamical model of ( 9) that can be used in the framework of KF is as follows: where For instance, for the first and second order 2 trend filtering, the matrix A n , the vectors x k , b n and h n are respectively defined as 1-order 2 trend filtering: 2-order 2 trend filtering:

1 trend filtering
In 1 trend filtering (or TV regularization), the unknown piecewise-polynomial signal x is estimated by solving (2b) which formulates the sparsity based denoising as the problem of minimizing the 1 norm of the derivative of x subject to the data fidelity constraint.Unfortunately, the second regularization term is non-differentiable.That is why the optimization problem (2b) is known as a difficult minimization problem with no explicit solution.One way to deal with is to replace it with a sequence of simpler ones.This procedure is known as majorization-minimization (MM) method (also known as bound optimization or surrogate optimization method).An overview of MM algorithms in signal processing, machine learning and communications is presented in [9].
As an application of MM approach, it can be used to convert the optimization problem (2b) to a simpler one.To this purpose, the MM approach proposes the following majorizer for the |x k | [9, Example 6]: where x(m) k denotes the estimated signal after m iterations (with an initialization, e.g., x(0) k = y k ).We leave the derivation of the majorizer in (12) as an exercise to the reader.The idea of using the MM approach to 1 trend filtering (TV regularization) was first proposed by Figueiredo et al. [10].With it, a majorizer for the 1 norm is defined as where and does not depend on x j .Therefore, in order to solve (2b), one can solve the following iterative optimization problem, instead: (14) argmin Note that the second regularization term is now differentiable and x(m) k is considered as a constant with respect to x k .In the next section, we derive a KF model for 1 trend filtering.( 14) can be expressed as argmin As shown in the appendix, the linear state-space model corresponding to (15) can be represented as which can also be rewritten as where φ where α i is defined before.( 18) can be rewritten in the following state space dynamical model: where A n , x k and h n are defined before and b For instance, for the first and second order 1 trend filtering, the matrix A n , the vectors x k , b n and h n are respectively defined as First order 1 trend filtering: Second order 1 trend filtering:

Kalman filter implementation
The state space models derived in the previous sections can be coupled with the Kalman filtering framework in order to estimate the desired signal.The KF for (10) and ( 19) are given as follows [11]: Measurement update: where is the a priori estimate of the state vector x k in the k-th stage using the observation y 1 to y k−1 , and xk|k E{x k |y k , • • • , y 1 } is the a posteriori estimate of the state vector after using the k-th observation y k .The matrices are also defined as the prior and posterior state covariance matrices, while K k is the Kalman gain.
A KS is usually employed after KF to produce a smoother results.It consists of a forward KF stage followed by a backward recursive smoothing stage.Comparing the KF equations for 1 and 2 trend filtering, the only difference is that the system noise gain matrix in 1 trend filtering is time variant It is worth to mention that against the KF implementation of 2 trend filtering that we run KF only one time over all k, in 1 trend filtering, one should run KF over all k, for a given m; and then this has to run multiple times for m = 1, 2, 3, • • •.Finally, 2 trend filtering is a special case of 1 trend filtering when we set φ (0) k to 1 (i.e., the initial estimation, x(0) k , is chosen such that ∇ n x(0) k = 1).In other word, the initial estimation is chosen as a polynomial of order n.
In the context of estimation theory, it is proved that for stationary processes, the KF converges to the optimal Wiener filter in steady state.It is also shown in appendix that the optimization problems (2b) and its particular case (2a) converge to the optimal Wiener smoothing filter.The time of convergence depends on the covariances of the process and measurement noises, namely q k and r k [or to their ratio, as seen in equations ( 22) and ( 32)].Therefore, in the Kalman filtering framework, we set q k and r k such that

Illustrative Example
The motivation behind 1 trend filtering was discussed in the previous sections.It was shown that 2 trend filtering is a special case of 1 trend filtering.An example of piecewiseconstant signal smoothing was given in Section and the first order 1 trend filtering was used to reconstruct the desired signal.We saw that the first order 1 trend filtering can be used to effectively separate the piecewise-constant signal and the random noise even if they are overlap in the frequency domain.The key point is that for fitting a piecewise-polynomial of degree n − 1 to the data, a weighted 1 -norm of the n-th order difference of the signal can be used as the penalty term in 1 trend filtering formula.In this section, we mainly focus on 1 trend filtering as 2 trend filtering is a special case of it.
Let us consider again the piecewise-constant signal given in Example 2. In Figure 2, we saw the result of 2 trend filtering using KS (trend smoothing in the Kalman filtering).The result of applying the first order 1 trend filtering implemented by KF and KS to the noisy piecewise-constant signal is shown in Figure 3.The KF is a causal algorithm and can be used for real time applications.The KS algorithm preserves the edges better than KF.This is because the KS algorithm uses information brought by "future" observations, therefore, provides better estimates of the current states than KF.However, it can only be used for offline applications.Now, we consider the signals that have more complex shape.We give an example from electrocardiogram (ECG) signal smoothing.ECG is a time-varying signal which provides some information of the cardiac health status.It consists of different complex waveforms, classically labeled as P-wave (linked to atrial depolarization), QRS complex (corresponds to ventricular depolarization) and T-wave (corresponds to ventricular repolarization).In the following, we use 2 and 1 trend filtering for ECG signal smoothing.We use the MIT-BIH Atrial Fibrilation Database [12].As an example, Figure 4(a) shows a specific case record 08378m from this database.Its noisy data is plotted in Figure 4(b).The noisy data is the result of contaminating the original signal by a random noise with SNR = 10.The ECG smoothing for this specific case using 2 and 1 trend filtering are reported in panels 4(c) and 4(d), respectively.For both 2 and 1 trend filtering algorithm, we use n = 3, which corresponds to modeling the ECG components segment as having a sparse order-3 derivative (i.e., approximately piecewise polynomial with polynomial segments of order 2).As can be seen, 2 trend filtering leads to the distortion of QRS complexes, while 1 trend filtering does not, as highlighted using arrows in panels 4(c) and 4(d).It is possible to solve the problem of 2 trend filtering by decreasing the regularization parameter (i.e., increasing the cutoff frequency).In that case, it follows the shape of QRS complexes but follows the noises in low-frequency components as highlighted using arrows in Figure 5.

What we have learned
Based on this article, using KF and KS, readers could implement the 2 and 1 trend filtering.One of the major advantages of the proposed framework is that it brings  x k , where θ n,k is and δ k is the Dirac delta function, (15) is expressed as: (24) argmin where * denotes the convolution operator.In the following, for simplifying the notation, we will drop the superscript (m) and we will simply refer to φ (m) as φ.The optimal solution of (24) is found by setting its derivative with respect to x k to zero.Doing it and considering that ∂/∂x k (θ n,k * x k ) 2 = 2θ n,−k * θ n,k * x k [4, Lemma 2], the optimal solution using 1 trend filtering is obtained as On the other hand, the state-space model (17) can be expressed as The optimal mean-square error estimator that estimates x k from y k is the optimal Wiener smoothing filter, whose transfer function is given by where S xx (z) and S vv (z) denote the power spectral densities Taking the inverse transform of (30) and after some manipulation, we find Comparing ( 31) with (25), we conclude that the 1 trend filtering is a special case of Wiener smoothing filter when the regularization parameter is chosen as In a particular case, when φ k = 1, (31) becomes the optimal solution of (2a): It confirms that the 2 trend filtering is a special case of Wiener smoothing filter [4, section 4].
figure.The signal obtained using 2 trend filtering (with different orders, n = 1, 2, 3) are shown in Figures1(d)-1(f).For quantitative comparison, the result of NSR which is a classical ratio between the power of the reconstruction error and the power of the signal (i.e., NSR = k (x k − xk ) 2 / k x 2 k , where x k and xk are respectively the original and estimated