Asymptotic Reverse Waterﬁlling Algorithm of NRDF for Certain Classes of Vector Gauss-Markov Processes

In this paper, we revisit the asymptotic reverse-waterﬁlling characterization of the nonanticipative rate distortion function (NRDF) derived for a time-invariant multidimensional Gauss-Markov processes with mean-squared error (MSE) distortion in [1]. We show that for certain classes of time-invariant multidimensional Gauss-Markov processes, the speciﬁc characterization behaves as a reverse-waterﬁlling algorithm obtained in matrix form ensuring that the numerical approach of [1, Algorithm 1] is optimal. In addition, we give an equivalent characterization that utilizes the eigenvalues of the involved matrices reminiscent of the well-known reverse-waterﬁlling algorithm in information theory. For the latter, we also propose a novel numerical approach to solve the algorithm optimally. The eﬃcacy of our proposed iterative scheme compared to similar existing schemes is demonstrated via experiments. Finally, we use our new results to derive an analytical solution of the asymptotic NRDF for a correlated time-invariant two-dimensional Gauss-Markov process.

I. INTRODUCTION Tatikonda et al. in [2] studied the fundamental limitations of a multidimensional closed-loop control system when a communication link intervenes between a stochastic linear fully observable time-invariant plant driven by a Gaussian noise process and a controller whereas the performance criterion is the classical linear quadratic cost.First, the authors of [2] gave sufficient conditions for their problem to ensure the control-theoretic separation principle between the feedback controller and the state estimation when the channel is either noiseless digital or noisy (and memoryless).Then, provided these conditions hold, they showed that the optimal cost of control can be cast by two terms, a full-knowledge cost due to the fully observable state and a communication cost that can be computed by a state estimation problem.To compute the communication cost, they introduced a Gaussian sequential rate distortion function (SRDF) defined by minimizing a variant of directed information [3] subject to a pointwise mean squared error (MSE) distortion function, a definition that is attributed to Gorbunov and Pinsker in [4].This information measure when solved, it ensures a lower bound on the communication cost irrespectively of whether the channel is noiseless or noisy.For time-invariant scalar-valued Gauss-Markov processes, the authors of [2] found an explicit expression of a lower bound on the optimal communication cost by computing in closed form the expression of the asymptotic Gaussian SRDF (with an unbounded distortion budget for unstable processes).The same expression (with a bounded distortion budget) was already derived for stationary Gauss-Markov processes in [5,Eq. (1.43)] (see also the more recent work in [6,Theorem 3]).For vector-valued Gauss-Markov processes, [2] showed that the lower bound on the optimal communication cost can be obtained subject to a reverse-waterfilling algorithm.Unfortunately, as shown in [7], [8] the reverse-waterfilling algorithm proposed in [2] gives a suboptimal solution to the SRDF.Fortunately, a way to compute optimally the lower bound on the communication cost for vector-valued Gauss-Markov processes was recently proposed by Tanaka et al. in [9].Specifically, the authors in [9] proved that the SRDF for multidimensional Gauss-Markov processes with MSE distortion is semidefinite representable and thus readily computable.Although this numerical approach is an important step, it lacks the insight of the parametric reversewaterfilling solution that may lead to analytical solutions for this problem beyond scalar-valued processes.Moreover, semidefinite programming (SDP) is a computationally expensive choice to be implemented in modern delay-constrained and computationally-limited devices and it does not adapt well to high dimensional systems.Therefore, understanding the correct reverse-waterfilling characterization that completely solves the multidimensional case in [2] is still an important open task to be accomplished.Toward this end, Stavrou et al. in [10] characterized the SRDF under the name nonanticipative rate distortion function (NRDF), for a time-varying scalar-valued Gauss-Markov process subject to a total (or average in time) MSE distortion function and characterized it via the solution of a dynamic reverse-waterfilling problem similar to the wellknown reverse-waterfilling of classical RDF [11,Theorem 10.3.3].The authors of [12] characterized the NRDF for time-varying vector-valued Gauss-Markov process subject to total and pointwise MSE distortion function via the solution of a dynamic reverse-waterfilling problem and they gave an iterative approach that solves the problem optimally only when all dimensions of the multi-input multi-output (MIMO) system are active.This result was restricted to time-invariant processes in [1].

A. Contributions
In this paper, we use the asymptotic reverse-waterfilling characterization of [1,Theorem 3] to obtain the following new results.
(1) We show that for certain classes of time-invariant vectorvalued Gauss-Markov processes, [1, Theorem 3] simplifies to a reverse-waterfilling algorithm obtained in matrix form (see Theorem 2).Moreover, this reverse-waterfilling algorithm has an equivalent form that amounts only to the eigenvalues of the involved matrices (we refer to this algorithm as the "eigenvalue form") and is similar to the classical reverse-waterfilling algorithm in information theory (see Corollary 1); (2) We construct an iterative scheme that computes optimally the reverse-waterfilling algorithm obtained in eigenvalue form (see Algorithm 1); (3) We derive a closed form expression of the asymptotic NRDF for a two-dimensional correlated Gauss-Markov process subject to a MSE distortion (see §IV). Discussion of the results.(1) The two equivalent forms of the reverse-waterfilling algorithm, i.e., the matrix form and the eigenvalue form, are obtained because in our analysis we prove that for the specific classes of time-invariant Gauss-Markov processes, all matrices commute by pairs hence they are simultaneously diagonalizable by an orthogonal matrix (an information lossless operation). 1 (2) The iterative scheme proposed herein is shown to execute much faster compared to similar numerical approaches that solve the same problem, e.g., [9], [1] and adapts extremely well to high dimensional systems.(3) To the best of the authors knowledge, this is the first non-trivial example of computing asymptotic NRDF beyond scalar-valued Gauss-Markov processes.
The rest of the paper is structured as follows.In §II, we give preliminaries and know results.In §III, we derive our new results and in §IV we apply our results to compute analytically an example.We draw conclusions and future directions in §V.
In rate distortion theory one is interested to reproduce sequences of symbols X n = x n generated by a source, by their reproduction symbols Y n = y n , subject to a fidelity.The distribution of the source is fixed and given by The convention is at t = 0, P(dx 0 |x −1 ) = P(dx 0 ).Following [4], the channel that is used to reproduce X n = x n generated by a source, by the reproduction symbols Y n = y n , is described by the conditional distribution2

P(dy
and it is subject to a design or is found by an optimization problem.Here the convention is P(dy 0 |y −1 , x 0 ) = P(dy 0 |x 0 ).By ( 1) and ( 2), the joint distribution is P(dx n , dy n ) P(dx n ) ⊗ P(dy n ||x n ) while P(dy t |y t−1 ) is induced by the joint distribution P(dx n , dy n ).
The NRDF of the source distribution (1) is defined through the mutual information between X n and Y n , i.e., I(X n ; Y n ), subject to a distortion or fidelity of reproducing X n = x n by Y n = y n based on (2).Given ( 1) and ( 2), the mutual information is defined by Next, we introduce the finite-time horizon NRDF and its per unit time limit, henceforth, called asymptotic NRDF of a time-invariant vector-valued Gauss-Markov process subject to a MSE distortion function.This definition was introduced in [4] and further analyzed in [2].
Definition 1: (Asymptotic Gaussian NRDF with MSE distortion) Let X t be the time-invariant R p -valued Gauss-Markov process where A ∈ R p×p is a deterministic matrix, X 0 ∼ N (0; Σ X0 ) is the initial state with Σ X0 0, and W t ∈ R p ∼ N (0; Σ W ), Σ W 0, is a white Gaussian noise process independent of X 0 .The finite-time horizon NRDF is defined by s.t.
assuming existence of the infimum exists and it is finite.The per unit time limit, called asymptotic ( 5) is defined by provided both infimum and limit exist and they are finite.
An upper bound on R na (D) is the following expression: provided the infimum and the limit exist and finite, where P(dy ∞ ||x ∞ ) denotes the sequence of conditional probability distributions P(dy t |y t−1 , x t ), t ∈ N 0 .It should be mentioned that for stationary or time-invariant Gauss-Markov process it is shown in [4, Theorem 4] (see also [8,Theorem 1]) that It is well-known that the optimization problem of ( 7) is convex with respect to the set of test channels P(dy n ||x n ) that satisfy the average (over time) MSE distortion, for D ∈ (0, D max ) ⊆ (0, ∞), and there exists an optimal solution characterizing it under general source distributions and distortion functions (see, e.g., [14]).By [12], the optimal "test channel" corresponding to R 0,n (D) is of the form that is non necessarily time-invariant, while the corresponding joint process {(X t , Y t ) : t ∈ N 0 } is not necessarily stationary.Further, by [15] the joint process {(X t , Y t ) : t ∈ N 0 } is jointly Gaussian (this is also shown in [2]).Moreover, [12] showed that for a multidimensional Gauss-Markov processes with average (over time) total MSE distortion constraint, the conditionally Gaussian optimal minimizer of a jointly Gaussian process is of the form A similar result for per-letter MSE distortion is derived in [5, Theorem 5].
The next result is derived in [16,Theorem 3].Lemma 1: Suppose that any P * (dy t |y t−1 , x t ) is timeinvariant and there output distribution P * (dy t |y t−1 ) is also time-invariant with a unique invariant distribution.Then, the following holds. (1) (2) The asymptotic optimal reproduction distribution is realized by where Next, we state the solution of ( 11) when solving the KKT conditions [17].This result is derived in [1, Theorem 3] and it is a consequence of the derivation in [12,Appendix].
Theorem 1: (Reverse-waterfilling characterization) The parametric solution subject to a reverse-waterfilling algorithm that corresponds to (11) is the following.
such that where ∆ * 1 0 is the solution of the algebraic Riccati equation and ∆ * 2 0 is the solution of the algebraic Riccati equation with θ > 0 and Remark 1: (Comments on Theorem 1) The characterization of Theorem 1 is derived based on the algebraic Riccati equations ( 17), (19).( 17) allocates the distortion when there is no reverse-waterfilling in dimension, i.e., when ∆ ≺ Λ.On the contrary, (19) takes care the distortion allocation at each dimension when some dimensions or all are inactive.This corresponds to ∆ Λ.Unfortunately, finding F 2 in ( 19) is very challenging, in general.Therefore, in [1, Algorithm 1], the authors proposed a numerical approach to compute (17) optimally, i.e., ∆ ≺ Λ.
In this paper, we show that for certain classes of multivariate Gauss-Markov processes, the characterization in Theorem 1 is independent of (19) and consequently of the condition F 2 (∆− Λ).Hence, for these classes of sources, the numerical approach derived in [1, Algorithm 1] is optimal.

III. NEW RESULTS
In this section, we derive our main results.First, we prove the following proposition.Proposition 1: (Matrix structure of (A, Σ W )) Suppose that in (4) the matrix structure of the pair (A, Σ W ) satisfies one of the following three cases: (i) A = αI p (scalar matrix) and Σ W 0; (ii) A is symmetric and Σ W = σ 2 w I p (scalar matrix), where Then, in (11) (A, ∆, Σ W ) commute by pairs which means that Λ and ∆ commute. 3roof: See Appendix B. Proposition 1 essentially reveals structural conditions on (4) such that all matrices of the optimization problem in (11) commute by pairs.Using this result, we now prove the following main result.
Theorem 2: (Asymptotic reverse-waterfilling algorithm: matrix form) Suppose that the conditions of Proposition 1 hold.Then, the characterization of Theorem 1 can be simplified as the follows.
such that where ∆ * 0 is the solution of the algebraic Riccati equation with θ > 0 such that trace(∆) = D.
Proof: Under the conditions of Proposition 1, we can reformulate Theorem 1 as follows: where with µ B,i with θ > 0 and µ F2,i ≥ 0 such that The above equivalent characterization of Theorem 1 follows because all matrices (A, ∆, Σ W , Λ, F 2 ) commute by pairs and they are simultaneously diagonalizable by an orthogonal matrix. 4Clearly, from (25) and the third constraint of (26) we can easily see that whenever µ Λ,i ≤ µ ∆,i , for any i, R na i = 0.This also means that the second condition of (26) is always satisfied once the other two conditions are imposed in (26).Hence without any loss of optimality we can remove the second condition of (26) and consequently (28).This also means that the condition µ F2,i (µ ∆,i − µ Λ,i ) = 0, ∀i is always satisfied and can be removed.Hence the resulting asymptotic reverse-waterfilling characterization of (25)-( 29) without (28) can also be written as an equivalent asymptotic reverse-waterfilling characterization in matrix form given by ( 22)-(24).This completes the derivation.
A way to solve optimally Theorem 2 is already proposed in [1, Algorithm 1].That numerical scheme is shown to operate faster compared to the SDP algorithm proposed in [9, Eq. ( 27)] because it only schedules the rate-distortion allocation based on the already solved optimization problem in (11) (obtained in the form of an algebraic Riccati equation) 4 This follows from Theorem 3 in Appendix A.
whereas SDP solver uses interior point methods to solve the optimization problem (11) which are computationally much more expensive.
Although the characterization of Theorem 2 is attractive, one may claim that it is not the same as the classical form of a reverse-waterfilling algorithm, see, e.g., [11,Theorem 10.3.3].For this reason, we reformulate the asymptotic reversewaterfilling algorithm of Theorem 2 into an equivalent form that is similar to the classical form of a reverse-waterfilling algorithm in information theory.We refer to this form of the reverse-waterfilling algorithm as the eigenvalue form.
Corollary 1: (Asymptotic reverse waterfilling algorithm: eigenvalue form) Under the conditions of Proposition 1, the parametric solution subject to an asymptotic reverse-waterfilling algorithm that corresponds to (11) is the following.
where µ Λ,i = µ A 2 ,i µ ∆,i + µ Σ W ,i , and µ ∆,i is computed by the reverse-waterfilling algorithm where µ ∆ * ,i > 0 is given by where µ B,i µ A 2 ,i µΣ W ,i and θ > 0 is chosen such that p i=1 µ ∆,i = D. Proof: The proof is immediate from the derivation of Theorem 2. Note that (32) is the positive solution of the quadratic equation ( 27).
In Algorithm 1 we propose an iterative scheme to solve optimally the reverse-waterfilling algorithm of Corollary 1.This approach is similar to the dynamic reverse-waterfilling algorithm proposed in [18, Algorithm 1] (see also [10, Algorithm 1]) but instead of the time index we now have the spatial (or dimension) index.
Comparison of existing algorithms that solve optimally (11): In what follows, we demonstrate a simulation experiment where we consider matrix structure for (A, Σ W ) within the two classes of vector-valued time-invariant Gauss-Markov processes introduced in Proposition 1 and a given distortion level D.Then, we solve (11) using the eigenvalue form of Algorithm 1, the equivalent matrix form that can be solved using [1, Algorithm 1] and the SDP approach of [9,Equation (27)].We try three distinct experiments where for each we change the number of dimensions of the system.Our aim is twofold.First, to study if Algorithm 1 is faster than [1, Algorithm 1] that in turn is known to be much faster than [9,Equation (27)].Second, to study its scalability by increasing the dimensions of the system.
To illustrate this point, in Table I, we provide for the first two examples the mean and the standard deviation of Algorithm 1 Reverse-waterfilling algorithm of Corollary 1 Initialize: number of spatial components p; distortion level D; error tolerance ; nominal minimum and maximum value θ min = 0 and θ max = p 2D ; pick an initial variance for µ Λ,1 ; pick the matrix structure of (A, Σ W ) in ( 4) and their corresponding eigenvalues {(µ A,i , µ Σ W ,i ) : i ∈ N p 1 } .Set θ = p/2D; flag = 0. while flag = 0 do Compute µ ∆,i ∀ i as follows: for i = 1 : p do Compute µ ∆ * ,i according to (32).Compute µ ∆,i according to (31).end for if the computational time needed for each of the three optimal numerical methods to execute over a sample of 1000 instances.All algorithms operate using an error tolerance = 10 −9 .For the last example we could not extract conclusive results for SDP because to execute over a sample of 1000 instances it takes days.For the first experiment we assume 10 × 10  and more than 140000 times faster than SDP.By assuming a relatively high-dimensional system (i.e., 150 × 150 symmetric matrices) we observe that Algorithm 1 is significantly faster than [1, Algorithm 1] (more than 2000 times).
Overall, the previous experiments show that Algorithm 1 is an elegant and more preferable choice to be implemented in modern delay-constrained and computationally-limited devices like the architectures of networked control systems, compared to SDP or [1, Algorithm 1].It is very fast and adapts extremely well in high dimensional systems (it is scalable).
Note that both Algorithm 1 and [1, Algorithm 1] can allow for different levels of tolerance (and thus become faster or slower accordingly).The choice of = 10 −9 is made to achieve the same accuracy with the SDP solver.

IV. EXAMPLE WITH ANALYTICAL SOLUTION
In this section, we give an example in which we derive an analytical solution of R na (D) for a correlated time-invariant R 2 -valued Gauss-Markov process.This is feasible because we choose (A, Σ W ) to be in the class of matrices that satisfy the conditions of Proposition 1.We compare our analytical solution to the optimal numerical solution of [9,Equation (27)].
We consider a R 2 -valued time-invariant Gauss-Markov process with parameters Clearly, the matrix structure in (33) is in the class of matrices that satisfy the conditions of Proposition 1.We will use Theorem 1, to derive a closed form solution for R na (D) for this specific correlated time-invariant Gauss-Markov source.We solve the full-rank case, i.e., when ∆ ≺ Λ that corresponds to the algebraic Riccati equation (17), and, the rank-deficient case, i.e., when ∆ Λ that corresponds to the algebraic Riccati equation (19).We will show that (19) is never used to obtain the optimal solution something expected from Theorem 2. We note that the last case in (16) requires zero rate, hence for this case the problem's solution is trivial.Full-rank Solution (∆ ≺ Λ): First, we use the spectral representation or eigenvalue decomposition of the real symmetric matrix ∆ * 1 ≡ ∆ of ( 17) which means that for an orthogonal matrix U ∈ R p×p , then, U T ∆U = diag(µ ∆,i ).This transforms (17) to the following matrix equation: Observe that in (34), the equation is satisfied if and only if we can obtain an orthogonal matrix U of the form: From (34), we observe that U is the orthogonal matrix that simultaneously diagonalizes (A, Σ W , ∆) and also Λ (This is expected from Proposition 1).Clearly, (34) can be reformulated in the equivalent eigenvalue form of (27) as expected from the derivation of Theorem 2 (see Corollary 1) .In this example, we put eigenvalues in a matrix form for better illustration of our results.By substituting (35), (36) in (34) we obtain: where Note that one of the solutions of (38a), (38b) is rejected because it is negative hence the resulting matrix ∆ is: In order to find Λ we use (14).This means that we need the eigenvalue decomposition of matrix A. This gives: Using ( 39) and ( 40) in ( 14) we obtain: Now, we use the left hand side (LHS) equation in (21) to obtain an additional equation which is used to find θ.By substituting (39) in (21) we obtain: Equation (42) gives two positive solutions.Using Lagrange duality theorem [19] we choose the one that results into greater rates and discard the one that gives the lower rates.This chosen θ is as follows: where Using (44) we can compute Λ in (41) which is given by: (D + 4)(9D + 4) − (D + 4) 8 + 1, (45a) Next, we compute the individual rates and the total sum rate over both dimensions.
Moreover, even if one assumes that µ Λ,1 = µ ∆,1 this will mean that µ ∆,1 < 0 which is incorrect.Hence, Case 2 can never be observed in this example.
In Discussion.The analytical solution of R na (D) is one small step towards more general analytical solutions within the classes of time-invariant multidimensional Gauss-Markov processes considered in this note.From the steps involved to compute R na (D) for the specific vector source, it is clear that the characterization of Theorem 1, ( 19) is not needed (matrix F 2 is never used).This is expected from our Theorem 2.

V. CONCLUSIONS AND FUTURE DIRECTIONS
In this paper, we considered certain classes of time-invariant multidimensional Gauss-Markov process which ensure that all matrices in the optimization problem (11) commute by pairs and thus they are simultaneously diagonalizable.As a result of this feature, we showed that the asymptotic reverse-waterfilling characterization of [1] can be simplified considerably and it is equivalent to a reverse-waterfilling algorithm obtained only by the eigenvalues of the involved matrices.For the latter algorithm, we proposed an iterative approach to compute optimally the optimization problem.We showed via experiments that the specific algorithm is extremely fast and much more scalable compared to the other existing algorithms in the literature that solve the same problem.Finally, we used our new results to obtain the closed form expression of NRDF for a correlated time-invariant R 2 -valued Gauss-Markov source.
Our ongoing research will focus on deriving more general analytical expressions than the one derived in this paper and on studying similar algorithms for the more general case of partially observable Gauss-Markov processes with MSE distortion function.Possible extension of Proposition 1 to time-varying systems is also under consideration.

APPENDIX A USEFUL DEFINITIONS AND THEOREMS
In what follows, we state a few important definitions and a theorem that we use throughout the paper.
P. A. Stavrou and M. Skoglund received funding by the KAW Foundation and the Swedish Foundation for Strategic Research.P. A. Stavrou and M. Skoglund are with the Department of Information Science and Engineering, KTH Royal Institute of Technology, Sweden, email: {fstavrou,skoglund}@kth.se.

Fig. IV. 1 :
Fig. IV.1:Comparison of SDP with the closed form expression of R na (D).

Definition 2 :
(Commuting matrices) [20, p. 5] Two p × p matrices A, B commute if AB = BA.More generally, the collection of p × p matrices (A 1 , . . ., A k ) commute by pairs if A i A j = A j A i , for j > i, i = 1, 2, . . ., k. Definition 3: (Product of symmetric matrices) [20, Section 1.3 (6)] The production AB of two square symmetric matrices A, B is itself symmetric if and only if A and B commute.Definition 4: (Orthogonal matrix) [20, Section 8.4 (a)] A square (say p × p) matrix U is said to be orthogonal if U T U = U U T = I p .Theorem 3: (Commuting matrices are simultaneously diagonalizable) [20, Theorem 21.13.1]If a collection of p × p matrices (A 1 , . . ., A k ) commute by pairs, then they can be simultaneously diagonalized with an orthogonal matrix U ∈ R p×p , namely, there exists an orthogonal matrix U and diagonal matrices, say D 1 , . . ., D k such that for i = 1, . . ., k

TABLE I
symmetric matrices.Our results demonstrate that Algorithm 1 is extremely fast operating around 1300 times faster than SDP and is significantly faster than [1, Algorithm 1] (around 60 times).By increasing the number of dimensions in the optimization problem (we assume 50×50 symmetric matrices), the results are even more impressive.Algorithm 1 is still extremely fast operating around 1000 times faster than [1,Algorithm 1]