Indirect NRDF for Partially Observable Gauss-Markov Processes with MSE Distortion: Complete Characterizations and Optimal Solutions

In this paper we study the problem of characterizing and computing the nonanticipative rate distortion function (NRDF) for partially observable multivariate Gauss-Markov processes with hard mean squared error (MSE) distortion constraints. For the finite time horizon case, we first derive the complete characterization of this problem and its corresponding optimal realization which is shown to be a linear functional of the current time sufficient statistic of the past and current observations signals. We show that when the problem is strictly feasible, it can be computed via semidefinite programming (SDP) algorithm. For time-varying scalar processes with average total MSE distortion we derive an optimal closed form expression by means of a dynamic reverse-waterfilling solution that we also implement via an iterative scheme that convergences linearly in finite time, and a closed-form solution under pointwise MSE distortion constraint. For the infinite time horizon, we give necessary and sufficient conditions to ensure that asymptotically the sufficient statistic process of the observation signals achieves a steady-state solution for the corresponding covariance matrices and impose conditions that allow existence of a time-invariant solution. Then, we show that when a finite solution exists in the asymptotic limit, it can be computed via SDP algorithm. We also give strong structural properties on the characterization of the problem in the asymptotic limit that allow for an optimal solution via a reverse-waterfilling algorithm that we implement via an iterative scheme that converges linearly under a finite number of spatial components. Our results are corroborated with various simulation studies and are also compared with existing results in the literature.


Indirect NRDF for Partially Observable Gauss-Markov Processes with MSE Distortion: Complete Characterizations and Optimal Solutions Photios A. Stavrou and Mikael Skoglund
Abstract-n this paper we study the problem of characterizing and computing the nonanticipative rate distortion function (NRDF) for partially observable multivariate Gauss-Markov processes with hard mean squared error (MSE) distortion constraints. For the finite time horizon case, we first derive the complete characterization of this problem and its corresponding optimal realization which is shown to be a linear functional of the current time sufficient statistic of the past and current observations signals. We show that when the problem is strictly feasible, it can be computed via semidefinite programming (SDP) algorithm. For time-varying scalar processes with average total MSE distortion we derive an optimal closed form expression by means of a dynamic reverse-waterfilling solution that we also implement via an iterative scheme that convergences linearly in finite time, and a closed-form solution under pointwise MSE distortion constraint. For the infinite time horizon, we give necessary and sufficient conditions to ensure that asymptotically the sufficient statistic process of the observation signals achieves a steady-state solution for the corresponding covariance matrices and impose conditions that allow existence of a time-invariant solution. Then, we show that when a finite solution exists in the asymptotic limit, it can be computed via SDP algorithm. We also give strong structural properties on the characterization of the problem in the asymptotic limit that allow for an optimal solution via a reverse-waterfilling algorithm that we implement via an iterative scheme that converges linearly under a finite number of spatial components. Subsequently, we compare the computational time needed to execute for both SDP and reversewaterfilling algorithms when these solve the same problem to show that the latter is a scalable optimization technique. Our results are corroborated with various simulation studies and are also compared with existing results in the literature.n this paper we study the problem of characterizing and computing the nonanticipative rate distortion function (NRDF) for partially observable multivariate Gauss-Markov processes with hard mean squared error (MSE) distortion constraints. For the finite time horizon case, we first derive the complete characterization of this problem and its corresponding optimal realization which is shown to be a linear functional of the current time sufficient statistic of the past and current observations signals. We show that when the problem is strictly feasible, it can be computed via semidefinite programming (SDP) algorithm. For time-varying scalar processes with average total MSE distortion we derive an optimal closed form expression by means of a dynamic reverse-waterfilling solution that we also implement via an iterative scheme that convergences linearly in finite time, and a closed-form solution under pointwise MSE distortion constraint. For the infinite time horizon, we give necessary and sufficient conditions to ensure that asymptotically the sufficient statistic process of the observation signals achieves a steady-state solution for the corresponding covariance matrices and impose conditions that allow existence of a time-invariant solution. Then, we show that when a finite solution exists in the asymptotic limit, it can be computed via SDP algorithm. We also give strong structural properties on the characterization of the problem in the asymptotic limit that allow for an optimal solution via a reverse-waterfilling algorithm that we implement via an iterative scheme that converges linearly under a finite number of spatial components. Subsequently, we compare the computational time needed to execute for both SDP and reverse-waterfilling algorithms when these solve the same problem to show that the latter is a scalable optimization technique. Our results are corroborated with various simulation studies and are also compared with existing results in the literature.I

I. INTRODUCTION
Nonanticipatory −entropy was introduced in [1], [2] motived by real-time communication with minimal encoding and decoding delays. This entity is shown to be a tight lower bound on causal codes for scalar processes [3] whereas for vector processes it provides a tight lower bound at high rates on causal codes and on the average length of all causal prefix free codes [4] (also termed zero-delay coding).
Inspired by the usefulness of nonanticipatory-entropy in real-time communication, Tatikonda et al. in [5] reinvented the same measure under the name sequential rate distortion function (RDF) 1 to study a linear fully observable Gaussian closed-loop control system over a memoryless communication channel subject to rate constraints. In particular, the authors of [5] used the sequential RDF subject to a pointwise MSE distortion constraint to describe a lower bound on the minimum cost of control for scalar-valued Gaussian processes and a suboptimal lower bound for the multivariate case obtained by means of a reverse-waterfilling algorithm [7, 10.3.3]. 2 Tanaka et al. in [10] revisited the estimation/communication part of the problem introduced by Tatikonda et al. and showed that the specific description of the sequential RDF is semidefinite representable. Around the same time, Stavrou et al. in [11] solved the general KKT conditions that correspond to the rate distortion characterization of the optimal estimation problem in [5] and proposed a dynamic reverse-waterfilling characterization (for both pointwise and total MSE distortions) that computes optimally the KKT conditions as long as all dimensions of the multidimensional setup are active, which is the case at high rates regime. In addition, in [11] they found the optimal linear coding policies (by means of a linear forward test-channel realization) that achieve the specific rate distortion characterization thus filling a gap created in [1,Theorem 5]. Recently, the optimal realization therein was used as a benchmark in [12] to derive bounds on a zero delay multiple description source coding problem with feedback.
Kostina and Hassibi in [8] revisited the framework of [5] and derived bounds on the optimal rate-cost tradeoffs in control for time-invariant fully observable multivariate Markov processes under the assumption of uniform cost (or distortion) allocation. Recently, Charalambous et al. in [13] used a state augmentation technique to extend the characterization of the Gaussian nonanticipatory −entropy derived in [2] to nonstationary multivariate Gaussian autoregressive models of any finite order.
The extension of the framework of [5] to stochastic linear partially observable Gaussian control systems under noisy or noiseless communication channels was initially studied in [14] whereas a variation of the uncontrolled problem is studied in [15]. Particularly, Tanaka in [15] considered the estimation/communication part of the problem and derived performance limitations by minimizing a sequential RDF with soft weighted pointwise MSE distortion constraints. To deal with this problem, he first reduced the time-varying partially observable Gaussian system into a fully observable one by employing a pre-Kalman filtering (pre-KF) algorithm. Then, he assumedá priori a structural result on its observations process to ensure the invertibility of the pre-KF algorithm and hence to guarantee that theá posteriori state estimate between the state process and the observations process computed by the pre-KF is an information lossless operation of the true observations process at each instant of time. Armed with this result and a modified MSE distortion constraint he then showed that the resulting problem can be equivalently reformulated as fully observable multi-letter optimization for which a cascade realization was proposed via the connection of a pre-KF, a covariance scheduling semidefinite programming (SDP) algorithm, an additive white Gaussian noise (AWGN) channel and a post-KF algorithm. The stationary case of the specific optimization problem is also briefly discussed. Tanaka et al. in [14,Section VII,Eq. (38)] considered a multi-letter optimization problem via directed information [16] between the observations process and the controlled process and an average total hard constraint obtained via the classical LQ cost. Again the major result therein is a modified fully observable multi-letter optimization problem for controlled processes achieved by a cascade realization in the spirit of [15] only for the finite time horizon problem.
Despite the interesting analysis of [14], [15], there are several important open questions still unanswered even for the estimation/communication problem. For instance, in both [14], [15] it is not clear what is the complete characterization that needs to be solved similar to what is already known for example when the input data are modeled via a linear fully-observable multidimensional system driven by additive white Gaussian noise (see, e.g., [11,Eq. (5.22)]). Moreover, the complete (minimum) realization of the optimal testchannel distribution including the identification of the reversewaterfilling parameters that achieve the specific characterization is also missing. Another important question has to do with the conditions that are needed to ensure (strict) feasibility of the optimization problem in both finite and infinite time horizon. Equally important questions include the derivation of optimal or suboptimal (numerical or analytical) solutions for this problem for both scalar or beyond scalar processes as well as the analysis of the problem for high dimensional systems that necessitates scalable optimization algorithms (an issue already known from the analysis of [17]).
Kostina and Hassibi in [8] understood some of the previous questions and derived analytical bounds on the exact solutions of the estimation and control problems for timeinvariant multivariate jointly Gaussian processes again under the assumption of uniform distortion allocation. Hence, a natural open question related to the bounds in [8] is their tightness for multidimensional systems. This question is also related to the fact that no insightful examples appeared in the literature so far to compute optimally partially observable multivariate Gauss-Markov processes and compare with the closed form bounds obtained in [8].

A. Contributions
In this work we study the problem of characterizing and computing the NRDF (hereinafter termed indirect NRDF) for partially observable multivariate Gauss-Markov processes under hard MSE distortion constraints in both finite and infinite time horizon. We obtain the following major results. (R1) We derive the complete characterization of the indirect NRDF for a partially observable time-varying Gauss-Markov process with an average total or pointwise MSE distortion constraint and we completely specify the corresponding optimal test-channel realization which is a linear functional of the current sufficient statistic of the past and present observation signals (see Theorem 1); (R2) We give sufficient conditions to ensure existence of a finite solution for any fixed finite time horizon (see Remark 5) and show that the problem for time-varying multivariate Gaussian processes is semidefinite representable (see Theorem 2); (R3) For time-varying scalar processes under average total MSE distortion constraints we derive the optimal closed form solution via a dynamic reverse-waterfilling algorithm (see Theorem 3) that we implement in Algorithm 1 whereas for pointwise MSE distortion constraints we derive the optimal closed form solution (see Corollary 2); (R4) For the infinite time horizon, we restrict our problem to time-invariant processes and identify necessary and sufficient conditions (i.e., detectability and stabilizability of appropriate pair of matrices) to ensure a steady state solution of the error covariance matrices of the sufficient statistic process (see Lemma 5) and then we give conditions that allow for a timeinvariant characterization in the asymptotic limit (see Theorem 4); (R5) For the infinite time horizon, we show that when a finite solution exists the problem is semidefinite representable (see Corollary 3) and under certain strong structural properties on the asymptotic characterization of the problem (see Proposition 3) we derive an optimal scalable reverse-waterfilling solution (see Theorem 5) with its algorithmic embodiment (see Algorithm 2); (R6) We supplement our major results with numerical validations including connections with [8] (see Section VI). Additional results and comparison to prior art. To be able to prove the major results (R1)-(R5), we first prove the exact expression of the lower bound that needs to be studied when the low-delay source coding system is modeled by partially observable Gauss-Markov process (see Definition 1). Then, we show via a modification of the distortion constraint that the specific information measure can be reduced to the classical NRDF [1], [11] hence it has similar functional and topological properties, i.e., convexity, lower-continuity etc. For jointly Gaussian processes we use a pre-KF algorithm (similar to [15]) to prove structural properties via a sufficient statistic approach together with a data processing inequality (see Lemma 3) that result into the same expression of the multi-letter optimization first appeared in [15,Eq. (17)] with hard average total MSE distortion constraints instead of soft pointwise MSE distortion constraints that were assumed in [15] for both finite and infinite time horizon (see Definition 2). Then, we apply [11,Theorem 4.1] and prove that the multi-letter optimization problem of Definition 2 can be simplified to a single-letter sequential optimization problem in which we only need the current sufficient statistic of the past and present observations symbols (see Proposition 2). The computational complexity of the SDP algorithm in finite time horizon is discussed in Remark 6 and this analysis also includes the single stage case. The computational time complexity and convergence of Algorithm 1 is analyzed in Remark 8 and that of Algorithm 2 in Remark 11. Note that Theorem 5 and its algorithmic embodiment Algorithm 2 are extremely important for two reasons; first we can gain better insights of the problem in the infinite time horizon (for instance it paves the way for one to derive optimal closed form solutions beyond scalar processes thus generalizing similar results obtained for the special case of fully observable time-invariant multivariate Gauss-Markov processes studied recently in [17,Section IV]) and, second, Algorithm 2 as Table I suggests can operate much faster than the SDP algorithm in high dimensional systems (it is scalable). Our numerical simulation in Example 2 apart from verifying numerically that both Corollary 3 and Theorem 5 coincide under certain structural properties, it also shows that the corresponding analytical lower bound obtained for partially observable time-invariant multidimensional Gauss-Markov processes via [8, Corollary 1, Theorem 9] is not tight in general but a fairly tight performance (not exact) can be observed at very low distortion. Consequently, its utility to controlled processes in [8,Theorem 5] should be seen under this consideration. Example 3, shows the utility of Algorithm 1 when we restrict our system to time-invariant scalar processes, namely, for certain necessary and sufficient conditions on the pre-KF algorithm it recovers the steady-state solution of Corollary 4. Finally, for every result in this paper we recover or explain how to recover as a special case the corresponding results obtained for fully-observable Gauss-Markov processes.
Notation. We let R = (−∞, ∞), Z={. . . , −1, 0, 1, . . .}, N 0 = {0, 1, . . .}, N n 0 = {0, 1, . . . , n}, n ∈ N 0 . Let X be a finite dimensional Euclidean space and B(X ) the Borel σ-field of X . A random variable (RV) defined on some probability space (Ω, F, P) is a map x : Ω −→ X , where (X , B(X )) is a measurable space. We denote a sequence of RVs by x t r (x r , x r+1 , . . . , x t ), (r, t) ∈ Z × Z, t ≥ r, and their realizations by x t r ∈ X t r × t k=r X k , for simplicity. If r = −∞ and t = −1, we use the notation x −1 −∞ = x −1 , and if r = 0, we use the notation x t 0 = x t . The distribution of the RV x on X is denoted by P(dx). The conditional distribution of a RV y given x = x is denoted by P(dy|x). The transpose and covariance of a random vector x are denoted by x T and Σ x . We denote the determinant, trace, rank, diagonal, diagonal elements, and eigenvalues of a square matrix S ∈ R p×p by |S|, trace(S), rank(S), diag(S), [·] ii and {µ S,i } p i=1 and S † . We denote the transpose and the pseudo-inverse of a real (rectangular) matrix F ∈ R p×m by F T and F † . The notation Σ 0 (resp. Σ 0) denotes a positive definite (resp. positive semi-definite) matrix. The notation A B (resp. A B) means A − B 0 (resp. A − B 0). We denote a p × p identity matrix by I p . R G (D) denotes the Gaussian version of the RDF. The expectation operator is denoted by E{·}; || · || denotes Euclidean norm; [·] + max{0, ·}. We denote by abs(| · |) the absolute value of a determinant.

II. PROBLEM STATEMENT
We consider the causal source coding setup of Fig. 1. In this setting, the "hidden" R p -valued source is modeled by a discrete-time time-varying partially observable Gauss-Markov process as follows where A t ∈ R p×p is a square non-random matrix, C t ∈ R m×p is a rectangular non-random matrix with m ≤ p, System's operation: At every time instant, the encoder observes the impair measurement z t (provided z t−1 are already observed) and generates the data packet m t ∈ M t ⊂ {0, 1} t of instantaneous expected rate R t = E| t |, where | t | denotes the binary sequence of t . At time t, m t is transmitted across a noiseless channels with rate R t . Upon receiving m t , a minimum MSE (MMSE) decoder immediately produces an estimate y t of the source sample x t , under the assumption that y t−1 are already reproduced. We assume that at time t = 0 there is no prior information whereas the clocks of the encoder and the decoder are synchronized. (3) Distortion Constraint. The distortion constraint is described Fig. 1: Causal coding of partially observable Gauss-Markov process.
by the average total MSE distortion given by: and its asymptotic limit by lim sup Performance. The performance of the multi-input multi-output (MIMO) system in Fig. 1 after some finite n can be cast by the following optimization problem: Eq. (4) The asymptotic limit of (6) is given as follows: III. THE EXACT LOWER BOUND ON (6) In this section, we first prove the exact lower bound that corresponds to the operational rates given in (6), (7) because its proof or construction analysis is to the best of the authors' knowledge not included in [8], [14], [15] or elsewhere.
We start by writing the data processing of information for the MIMO system of Fig. 1 in terms of its joint distribution. In particular, the joint distribution induced by the joint process {(z t , m t , y t ) : t ∈ N n 0 } admits the following decomposition: P(dy n , dm n , dz n ) = ⊗ n t=0 P(dy t , dm t , dz t |y t−1 , m t−1 , z t−1 ) = ⊗ n t=0 P(dy t |y t−1 , z t , m t ) ⊗ P(dm t |m t−1 , z t , y t−1 ) where (a) stems from the fact that we assume in our system the following natural conditional independence constraints Remark 1: (Trivial initial information) To be consistent to the setup of Fig. 1, in (8) we assume that the joint distribution P(dz −1 , dm −1 , dy −1 ) generates trivial information.
The following data processing result, provides the appropriate information measure that can be used to compute a lower bound on (6).
Lemma 1: (Data processing inequalities) Under the decomposition of the joint distribution in (8), the communication system in Fig. 1 admits the following data processing inequalities: where Proof: The proof follows precisely similar steps to the proof of [18, Theorem 1] thus we omit it.
Next, we show how to formally construct the information measure (12). Observations Process. The observations process {z t : t ∈ N n 0 } induces the sequence of conditional distributions P(dz t |z t−1 ), t ∈ N 0 . At t = 0 we assume that P(dz 0 |z −1 ) = P(dz 0 ) and by Bayes' rule we obtain It should be noted that for the system model (1), (2), at each instant of time, the conditional distribution of P(dz t |z t−1 ) depends on the posterior distribution of the hidden data x t given all the past observation symbols z t−1 via Reproduction or "test-channel". The reproduction process y t parametrized by Y t−1 × Z t induces the sequence of conditional distributions known as test-channels as follows P(dy t |y t−1 , z t ), t ∈ N n 0 . At t = 0, no initial state information is assumed, hence P(dy 0 |y −1 , z 0 ) = P(dy 0 |z 0 ). The sequence of conditional distributions {P(dy t |y t−1 , z t ) : t ∈ N 0 } uniquely defines the family of conditional distributions on Y n parametrized by z n ∈ Z n , given by and vice-versa. From (13) and (15), we can uniquely define the joint distribution of {(z t , y t ) : t ∈ N n 0 } by P(dy n , dz n ) = P(dz n ) ⊗ Q(dy n |z n ).
In addition, from (16), we can define the Y n −marginal distribution P(dy n ) ⊗ n t=0 P(dy t |y t−1 ), where Given the above construction of distributions we obtain the following variant of directed information [16] I(z n ; y n ) where (a) is due to chain rule of relative entropy using the Radon-Nykodym derivative [19]; (b) follows by definition. Definition 1: (Exact lower bounds on (6), (7)) For a given observation processes {z t : t ∈ N n 0 } that induces the conditional distribution (14), the exact lower bound on (6), hereinafter called remote or indirect NRDF, subject to (4) is defined as follows Eq. (4) Moreover, its asymptotic expression that corresponds to a lower bound on (7) is given by Eq. (5) lim sup provided the limit in (20) takes a finite value. Next, we further analyze the information measure introduced in Definition 1 and discuss some of its most important properties.
The name indirect or remote NRDF is adopted because the specific information measure can be seen as an extension to causal processes (with memory) of the remote or indirect RDF defined for i.i.d. memoryless processes {(x t , z t , y t ) : t ∈ N n 0 } or random variables (x, z, y) in the context of noncausal coding see, e.g., [20], [21], [22, Chapters 3.5, 4.5], [23]. Following for instance the approach in [23], one can transform the indirect NRDF of (19) into a direct NRDF by creating a modified distortion constraint. For completeness, next we include such steps to obtain the amended distortion constraint.
where ( ) follows due to the conditional independence constraint P(dy t |z t , x t ) = P(dy t |z t ), for any t = 0, 1, . . . , n; ( ) follows by the system model (1), (2), i.e., Hence, (19) can be equivalently reformulated as follows which corresponds precisely to a direct NRDF. It is easy to show that (23) is convex with respect to the test channels {P(dy t |y t−1 , z t ) : t ∈ N n 0 } following for instance [24]. In addition, It is also well known that, R [0,n],in (D) achieves smaller rates if in addition to {(x t , z t ) : t ∈ N n 0 } being a jointly Gaussian process with the linear evolution of (1), (2), the joint process {(x t , z t , y t ) : t ∈ N n 0 } is also Gaussian because then I(z n ; y n ) ≥ I G (z n ; y n ) (that is, the Gaussian version of I(z n ; y n )) which in turn implies that In this section, we assume that the end-to-end system in Fig. 1 is jointly Gaussian and we completely characterize for the first time the exact lower bound (23) in finite time and provide conditions to ensure a strictly feasible solution for the problem for any finite n.
To completely characterize the problem in finite time we use a two-step approach. As a first step, we employ a pre-KF to create the MMSE estimator of x t given the past and current observation symbols z t for any t. Then, using a sufficient statistic approach we show that the MMSE estimator of the pre-KF is in fact under certain conditions a sufficient statistic at each time instant of the process y t parametrized by Y t−1 as it contains all the information about z t .
Lemma 2: (Classical KF) For the jointly Gaussian system model of (1), (2), define theá priori andá posteriori state estimates as x t|t−1 E{x t |z t−1 } and x t|t E{x t |z t }, respectively, and their corresponding error covariance matrices by 3 : t ∈ N n 0 } are computed recursively forward in time as follows: where I z t is an orthogonal process independent of ( The proof follows using every standard textbook on state estimation and filtering theories, see e.g., [26]- [29] thus we omit it. Before we proceed to the next result, we state as a corollary the special case of time-varying fully observable multivariate Gauss-Markov process. Corollary 1: (Special case of Lemma 2) Suppose that the system model in (1), (2) is simplified to time-varying fully observable multivariate Gauss-Markov process, i.e., Then, the KF recursions of (26) simplify as follows: Proof: The derivation is straightforward using properties of conditional expectation and the fact that C t = I p , ∀t, and Σ nt = 0, ∀t.
Before we prove a main structural result, we prove an adaptation of a result derived for random variables in [7, p. 35] to causal processes.
Proposition 1: (Data processing inequality) Consider the joint process {(x t , z t , y t ) : t ∈ N n 0 }. For each t = 0, 1, . . . , n, let the statistic ξ t = f (z t ). Then, for any n, assuming I(ξ t ; y t |y t−1 ) < ∞, I(z t ; y t |y t−1 ) < ∞, ∀t. Moreover, the inequality in (29) holds with equality if In that case, ξ t is called sufficient statistic of the process y t parametrized by Y t−1 because it contains all the information of z t about y t parametrized by Y t−1 at each instant of time.
Using Proposition 1 we prove structural sufficient conditions to ensure that (29) holds with equality for jointly Gaussian multivariate processes.
Lemma 3: (Structural sufficient conditions for equality of (29)) Suppose that {(x t , z t , y t ) : t ∈ N n 0 } is a jointly Gaussian multivariate process. Moreover, let ξ t = E{x t |z t } (the MMSE estimator of x t given z t ). Then, (29) holds with equality if C t ∈ R m×p in (2) is full row rank at each t.
Proof: The proof is based on the structural properties of the optimal minimizer of the general problem in (23). Observe that the following hold where (a) follows because ξ t = E{x t |z t } and is consistent with the conditional independence P(dy t |y t−1 , ξ t , z t ) = P(dy t |y t−1 , z t ), ∀t = 0, 1, . . . , n of Proposition 1; (b) follows from Lemma 2 because from the innovations process we have (2) is full row rank at each t. Hence, following Proposition 1 we proved that the conditional independence constraint (30) holds and this in turn implies that for jointly Gaussian processes, ξ t is a sufficient statistic about z t for the process y t parametrized by Y t−1 . This completes the proof.
Remark 2: (Connection to [15]) It should be emphasized that a similar result with Lemma 3 was obtained in [15, Lemma 2] by takingá priori the structure of matrix C t in (2) and claiming that the pre-KF in his cascade realization is causally invertible thus an information lossless operation. In our case, we follow a structural approach reminiscent of the one proposed in [30, p. 20, Eq. (II.135)-(II.138)] by showing equality of the optimal minimizers via a sufficient statistic approach. Clearly, if the matrix C t ∈ R m×p in (2) is not full row rank, then, the inequality in (29) is strict.
Next, we study the structure of the amended distortion constraint in the convex optimization problem of (23) obtained for jointly Gaussian processes. Specifically, where (i) follows because for jointly Gaussian processes ξ t is the optimal MMSE estimator of x t given z t and from the orthogonality principle; (ii) follows by definition of theá posteriori error covariance of the optimal MMSE obtained from the KF recursions in Lemma 2. Finally, the amended distortion that corresponds to the distortion constraint in (23) is obtained by taking the expectation with respect to the joint distribution of {(z t , y t ) : t ∈ N n 0 } in (32) and then the summation which will give Putting all the pieces together, we can reformulate (23) (and its asymptotic limit) to a convex problem for jointly Gaussian processes as follows. Definition 2: (Indirect NRDF for partially observable Gaussian processes) Suppose that the process {(x t , z t , y t ) : t ∈ N n 0 } is jointly Gaussian and matrix C t ∈ R m×p in (2) is full row rank. Then, the indirect or remote NRDF (19) and its asymptotic limit (provided it exists) can be reformulated as follows where in (34) where D min The lower bound (34) shows an interesting resemblance to the classical remote RDF obtained for i.i.d. memoryless Gaussian processes or random variables using non-causal coding [21]. In particular, similar to that case, the distortion constraint in (34) consists of two parts of which only one affects the rates. As a result the other part can be essentially subtracted from the given distortion level.
(4) Definition 2 is not the same as the definition obtained in [15,Eq. (16)]. Therein the author assumes a lower bound with a soft pointwise MSE distortion constraint whereas we consider a lower bound subject to hard average total distortion constraints. (5) To the best of the authors' knowledge, [8] has never proved the information measure in Definition 2 for jointly Gaussian processes either in finite time or in the asymptotic limit.
The following is a key structural property on our objective function in (34) for the development of our results. Proof: Since the process {ξ t : t ∈ N n 0 } admits the Markov realization in the KF recursions in (26), i.e., ξ t = A t−1 ξ t−1 +k z t I z t , then, the convex optimization problem in (34) is having implicit recursions similar to the ones described in [11,Theorem 4.1] obtained via dynamic programming backward in time [31]. The only difference is that the source distribution is simply replaced by the distribution {P(dξ t |ξ t−1 ) : t ∈ N n 0 }. This completes the derivation. Using Proposition 2, we will shortly provide for the first time, the complete characterization in finite time of the indirect NRDF for time-varying partially observable multidimensional jointly Gaussian processes.
As a first step, we need the following helpful lemma which is a non-trivial generalization of the classical KF algorithm and a generalization of a similar result obtained in [11].
Lemma 4: (Realization of {P * (dy t |y t−1 , ξ t ) : t ∈ N n 0 }) For the system model in (1), (2), suppose that the joint process {(x t , y t , z t ) : t ∈ N n 0 } is jointly Gaussian with C t ∈ R m×p in (2) to be full row rank. Then, the following statements hold.
(1) Any {P * (dy t |y t−1 , ξ t ) : t ∈ N n 0 } is realized by where ξ t|t−1 E{ξ t |y t−1 }, {v t ∈ R p ∼ N (0; Σ vt ) : t ∈ N n 0 } is an independent Gaussian process independent of {(w t , n t ) : t ∈ N n 0 } and x 0 , and {H t ∈ R p×p : t ∈ N n 0 } are time-varying deterministic matrices (to be designed). Moreover, the innovations process {I ξ t ∈ R p : t ∈ N n 0 } of (38) is the orthogonal process given by where : t ∈ N n 0 } satisfy the following generalized discrete-time forward KF recursions: where Σ ξ t|t 0 and Σ ξ where 0 } is assumed to be jointly Gaussian, then, {P * (dy t |y t−1 , ξ t ) : t ∈ N n 0 } is conditionally Gaussian, and we can obtain the orthogonal realization (2) This follows from the discretetime KF equations. (3) The characterization that achieves (38) is obtained from (1), (2) and the definition of conditional mutual information I(ξ t ; y t |y t−1 ) at each time instant t. This completes the proof.
At this point we need to stress some important technical comments on Lemma 4 which is essentially an intermediate step towards the complete characterization of the problem in finite time (i.e., it is not involved in the final optimization problem).
Remark 4: (On Lemma 4) (1) If in the KF recursions of Lemma 4 we have Σ I ξ t 0, i.e., rank(Σ I ξ t ) = l < p, then, we can use a dimension reduction approach as follows. First, we need to find the pseudo-inverse matrix of Σ ξ It using singular value decomposition (SVD) (or eigendecomposition) [32]. For example, recall that the eigendecomposition of Σ ξ It = U ΣU T where the orthogonal matrix U = U 1 U 2 with U 1 ∈ R p×l , U 2 ∈ R p×(p−l) and Σ = diag µ Σ ξ is formed by taking the inverse of all non-zero elements of Σ. The next step is to exclude the degenerated rows and columns (i.e., the p − l-dimensional null space) from (U, Σ + ) hence obtaining Σ † ). In other words, we can obtain for some t the pseudo-inverse matrix Σ † where U l and Σ + l are matrices with (p − l) degenerated rows and columns deleted. In the end, the reproduction process admits a dimension reduction of p − l elements, that is, {y t ∈ R l : t ∈ N n 0 }; (2) The characterization in (41) is general and at this point we did not give conditions to ensure existence of a finite solution. Such conditions are pivotal and will be provided in the sequel; (3) Similar to Corollary 1, one can recover from Lemma 4 the special case of time-varying fully observable multivariate Gauss-Markov processes. In particular, for the system model in (27), (28), we obtain using Corollary 1 that in Lemma 4 ξ t = x t , ∀t, {P * (dy t |y t−1 , ξ t ) ≡ P * (dy t |y t−1 , x t ) : t ∈ N n 0 } and ξ t|t−1 = E{x t |y t−1 }. The analysis will give precisely the result first derived in [11,Lemma 5.2].
The next theorem gives the complete finite dimensional characterization of (37) for partially observable Gaussian processes when the end-to-end system is jointly Gaussian subject to an average total MSE distortion constraint. It also reveals the optimal linear Gaussian test-channel distribution (forward test-channel realization) that corresponds to this problem.
Theorem 1: (Complete characterization of (37) for jointly Gaussian processes) The information measure in (37) corresponds to the following characterization . Moreover, if Σ I ξ t 0 the above characterization, is achieved by a linear Gaussian "test channel" P * (dy t |y t−1 , ξ t ) of the form where y t ∈ R p , with y −1 = 0, and Otherwise, if Σ I ξ t 0 with rank(Σ I ξ t ) = l < p, the characterization in (43) corresponds to a linear Gaussian "test channel" P * (dy t |y t−1 , ξ t ) of the form (44) with reduced dimension y t ∈ R l (l < p) such that the scaling (45) hold for l < p.
Proof: From MSE estimation theory we know that the MSE inequality n t=0 E ||ξ t − y t || 2 ≥ n t=0 E ||ξ t − ξ t|t || 2 holds for all (H t , Σ vt ), t ∈ N n 0 , and it is achieved if and only if ξ t|t = y t . Sufficient conditions for the latter to hold are (i) ξ t|t−1 ≡ E{y t |y t−1 } and (ii) k ξ t = I p . Note that (i) holds by the general KF algorithm in Lemma 4. The choice of (45) satisfies (ii) as long as Σ I ξ t 0 hence a smaller distortion for a given rate is achieved and also the Markov realization in (44) holds. If Σ I ξ t 0, then, following Remark 4, we simply seek for the pseudo-inverse matrix Σ † I ξ t = U l Σ + l U T l and exclude from the system the degenerated rows and columns of dimension p − l. Under this dimension reduction, the sufficient conditions (i), (ii) hold and the test-channel realization with the appropriate scalings follows. From Theorem 1 it can be easily checked that via Corollary 1 we can recover the complete characterization in finite time of the NRDF for time-varying fully observable Gauss-Markov processes with MSE distortion first derived in [11].
Next, we give the optimal numerical solution of the characterization of Theorem 1 as long as the conditions of Remark 5 hold.
Theorem 2: (Optimal numerical solution of (43)) Compute forward in time via (26) : t ∈ N n 0 } making sure that the conditions of Remark 5 hold. Then, the solution of (43) is semidefinite representable as follows: (1) Suppose A t ∈ R p×p is full rank and k z t+1 Σ I z t+1 k z T t+1 0. Moreover, introduce the decision variable Γ 1 t 0 and let the factorization of the singular matrix k z Next, we stress some technical comments on Theorem 2.  Table I) even for the single stage case at high dimensional problems, the SDP algorithm (with the steady-state covariance matrices obtained by the pre-KF recursions) operates extremely slow. Hence finding alternative optimal or near-optimal algorithmic approaches with reasonable computational complexity aligned with the state of the art large scale networks that operate using computationally limited resources remains an intriguing open problem.
where α t ∈ R and c t ∈ R \ {0} are non-random, x 0 ∈ R ∼ N (0; σ 2 x0 ) is the initial state, w t ∈ R ∼ N (0; σ 2 wt ), σ 2 wt > 0 is an independent sequence, n t ∈ R ∼ N (0; σ 2 nt ), σ 2 nt ≥ 0, is an independent sequence, independent of {w t : t ∈ N n 0 }, whereas x 0 is independent of {(w t , n t ) : t ∈ N n 0 }. Before we proceed, we denote Σ x t|t ≡ σ 2 υt , for any t ∈ N n 0 . Additionally, we rewrite the general characterization of Theorem 1 for scalar processes under the assumption that the total rates yield a finite solution, i.e., where In the next theorem, we give the optimal solution of (49) via a dynamic reverse-waterfilling algorithm.
Remark 8: (On Algorithm 1) Algorithm 1 ensures linear convergence in finite time via a bisection method for a given error tolerance by picking as starting points appropriate nominal range of values for θ (i.e., θ min and θ max ). The convergence of bisection method implies that θ converges, hence within the error tolerance . We note that the nominal values of θ min and θ max vary depending on the data of the system model (48). The most computationally expensive operation in Algorithm 1 is the for loop and the bisection method that yield a time complexity of approximately O(n log(n)) (linearithmic time complexity). In Fig. 2 we illustrate a numerical simulation of the average running time needed for Algorithm 1 to execute (vs) the time horizon n when the error tolerance is = 10 −9 . For this simulation we consider that each n is the mean of 10000 time instants. (2) As Remark 7 suggests, for time-varying fully observable Gauss-Markov processes Algorithm 1 recovers the algorithm proposed in [33,Algorithm 1]. In the next corollary, we give for the first time, a closed form expression of the system in (48) under pointwise MSE distortion constraints.
For the special case of time-varying fully-observable Gauss-Markov processes it can be easily seen following precisely Remark 7 that we can recover [34, Corollary 2].

V. COMPLETE CHARACTERIZATION AND OPTIMAL COMPUTATIONAL METHODS: INFINITE TIME HORIZON
In this section, we analyse the asymptotic limit of (37). To do it, we first restrict our system model (1), (2) to timeinvariant processes, i.e., A t = A, Σ wt = Σ w , C t = C, Σ nt = Σ n , ∀t. Second, we apply known results for the convergence of the discrete time Riccati equation (DRE) of Lemma 2. These results can be found for instance in [28,Chapter 7.3], [27,Appendix E] or [29]. Before we state a lemma, we note that in the sequel, we adopt for simplicity the following notation Lemma 5: [27], [28] (Necessary and sufficient conditions for convergence of the time-invariant DRE of Lemma 2 to a unique stabilizing solution) Let (A, Σ w , C, Σ n ) ∈ R p×p × R p×p × R m×p × R m×m . Then, the DRE that corresponds to Lemma 2 is the following where Π 0 0 (always positive definite). Moreover, the corresponding discrete time algebraic Riccati equation (DARE) is as follows Then, the following statement holds. Let the pair (A, C) to be detectable and the pair (A, Σ 1 2 w ) to be stabilizable (or controllable on and outside the unit circle). Then, any solution of (55), i.e, {Π t : t ∈ N 0 }, is such that lim t−→∞ Π t = Π, Π 0 for any Π 0 0 which corresponds to the maximal unique stabilizing solution of (56). This further means that the steady-state KF, i.e., the limiting expression of x t|t ≡ ξ t in (26) is asymptotically stable. Next, we provide an example applied to scalar processes, to illustrate the concept of Lemma 5.
Example 1: (Convergence of the time-invariant DRE for scalar processes) Consider the time-invariant version of the system model in (48), i.e., α t = α ∈ R, σ 2 wt = σ 2 w > 0, c t = c ∈ R \ {0}, σ 2 nt = σ 2 n ≥ 0, ∀t. Then, the time-invariant scalar-valued DRE of (55) is where Π 0 > 0. The corresponding scalar-valued DARE of (56) is as follows Moreover, introduce the pairs (a, c) and (α, (σ 2 w ) 1 2 ). Then, by definition, the pair (α, c) is always detectable and the pair (α, (σ 2 w ) 1 2 ) is always stabilizable (because by definition σ 2 w > 0). Hence, from Lemma 5 any solution of (57) is such that lim t−→∞ Π t = Π, with Π ≥ 0 that corresponds to the unique stabilizing solution of (58). In what follows, we compute the closed form solution of Π ≥ 0. Note that (58) can be reformulated to the quadratic equation , that gives the following two solutions Clearly, by conditions the negative solution of Π is rejected. Hence from (59) we have that the unique stabilizing solution is not only non-negative but also positive. This is because from by definition of our system model (Π 0 > 0 because σ 2 w > 0). Special cases: (i) Suppose that in (57) we let σ 2 n = 0. Then, from (58) we obtain Π = σ 2 w > 0. (ii) Suppose that in (57) we let α = 0. Then, from (58) we obtain Π = σ 2 w > 0. (iii) Suppose that in (57) we let α = 0, σ 2 n = 0. Then, from (57) we obtain Π = σ 2 w > 0. Remark 9: (On Lemma 5) In Lemma 5 we gave necessary and sufficient conditions for the DARE of theá priori error covariance in Lemma 2 to converge to its steady state. We observed via Example 1 that this value is always positive for scalar processes. Clearly, what we observe for scalar processes holds for multidimensional processes because Σ w 0 with Π t 0, ∀t (from Lemma 2). Moreover, the steady state of thé a priori error covariance in Lemma 2 implies the convergence of theá posteriori error covariance as well. That case however is slightly different because we can allow an initial condition Σ 0 0 (see Lemma 2). In fact as Corollary 1 suggests, if Σ n = 0 and C = I p , then Σ x t|t = 0, ∀t, and as a result Σ = lim t−→∞ Σ t = 0.
Next, we prove a theorem where we give necessary and sufficient conditions for the pre-KF algorithm to converge to its steady-state values and conditions to ensure a time-invariant solution of the characterization in (43).
Theorem 4: (Asymptotic characterization of (43)) Suppose that the system (1), (2) is restricted to time-invariant processes with the pair (A, C) detectable and the pair (A, Σ 1 2 w ) stabilizable. Moreover restrict the test-channel distribution P(dy t |y t−1 , ξ t ) to be time-invariant and the output distribution P(dy t |y t−1 ) to be time-invariant with a unique invari- where Σ ξ 0 and Π ξ 0 are the time-invariant values of Σ ξ t|t , and Σ ξ t|t−1 , respectively. Moreover, Finally, the above characterization, is achieved by a timeinvariant linear Gaussian "test channel" P * (dy t |y t−1 , ξ t ) of the form where Proof: See Appendix C. Remark 10: (On Theorem 4) (1) From Theorem 4 we can easily recover the known characterization in the infinite time horizon of the time-invariant fully observable Gauss-Markov processes (see, e.g., [4,Theorem 3]). In particular, suppose that the system model (27), (28) is restricted to time-invariant processes, i.e., A t = A, Σ wt = Σ w , ∀t. Then, from Corollary 1 we obtain k z Σ I z k zT = Σ w and D min [0,∞] = 0 that when substituted in (60) recovers [4, eq. (26)].
In what follows, we give the optimal numerical solution of the problem in Theorem 4.
Corollary 3: (Optimal numerical solution of (60)) The solution of the (60) is semidefinite representable as follows: (1) Suppose A ∈ R p×p is full rank and k z Σ I z k zT 0. Moreover, introduce the decision variable Γ 1 0 and let the factorization of the singular matrixΣ BB T . Then, for D > D min [0,∞] we obtain (2) SupposeΣ 0. Moreover, introduce the decision vari- Proof: The proof is a special case of Theorem 2 hence we omit it.
In what follows, we derive strong structural properties on (60) that allow for a simplified optimization problem which can be optimally solved via a reverse-waterfilling algorithm.
Proposition 3: (Strong structural properties on (60)) Suppose that in the characterization of (60) one of the following structures between (A,Σ) hold.
Proof: The derivation is done following similar arguments to [17, Proposition 1] thus we omit it.
Theorem 5: (Optimal numerical solution of (60)) Suppose that one of the conditions of Proposition 3 hold. Then, Moreover, the optimal parametric solution of (66) can be computed for µ Σ ξ ,i > 0, and any i as follows: where θ * > 0, µ Υ,i 2µ A 2 ,i µΣ ,i > 0 5 and D > D min [0,∞] . Proof: WhenΣ 0 the derivation follows using similar steps of the derivation of [17, Theorem 2] and we omit it. However, if for instance in Proposition 3, (i),Σ 0 we use a standard continuity argument, that is, there exists a δ > 0 such thatΣ =Σ + I p is nonsingular for all ∈ (0, δ) (see, e.g., [36,Theorem 2.9]). In other words, we create aΣ 0, then following similar steps to the derivation of [17,Theorem 2] and taking in the computations that lim −→0 +Σ the result follows. An implementation of the reverse-waterfilling solution of Theorem 5 is given in Algorithm 2.
Remark 11: (Complexity of Algorithm 2) Again the convergence of Algorithm 2 is guaranteed for finite dimensional matrices due to the bisection method similar to Algorithm 1. Compute µ Σ ξ ,i , ∀i, as follows: and the bisection method with approximately linearithmic time complexity similar to Algorithm 1, i.e., O(p log(p)). Hence the overall time complexity is approximately O(p 3 +p log(p)). However, if we optimize matrix multiplication using for example the current state of the art computing approaches that allow time complexity of around O(p 2.37286 )) [37] the complexity can further reduce to O(p 2.37286 + p log(p)). In Table I we compare the general optimal solution obtained via SDP in Corollary 3 with the structural optimal solution obtained in Theorem 5 and implemented in Algorithm 2 for the same input data and distortion level. For low dimensional vector systems (i.e., p = 10) we compute the average computational time needed for 1000 instances using both computational methods for an error tolerance of = 10 −9 . We see that Algorithm 2 is approximately 550 times faster than SDP. For medium size vector systems (i.e., p = 100) we perform the same experiment for 100 instances with = 10 −7 . The results show that Algorithm 2 is approximately 17500 times faster than SDP. We note that to obtain a result from SDP for 1000 instances would require days therefore we did not attempt with the specific computer such experiment. In addition, it is likely that the result for both SDP and Algorithm 2 would not change much. For high dimensional vector systems (i.e., p = 500) the result is not-conclusive because SDP would take many days to give a relatively fair result even for 100 instances. In contrast Algorithm 2 operates fine as illustrated in Table I. The results clearly demonstrate that Algorithm 2 is much more appealing choice to use when solving problems with certain structure or systems with computationally limited resources as opposed to the SDP algorithm.  We conclude this section, by finding the optimal analytical solution for the time-invariant version of the system model (48).

VI. NUMERICAL SIMULATIONS
In this section, we provide two examples with numerical simulations for some of the major results of this paper.
and from (56) we obtain Π 0 which further implies the steady-state solution of Σ 0 both given as follows We recall using [8, Corollary 1, Theorem 9], that the closed form solution of the sum-rate therein under the assumption of uniform rate-distortion allocation is given by whereā abs(|A|) 1 p ,Σ = Π − Σ, with D > trace(Σ). In Fig. 3, we give the optimal numerical solution obtained via Corollary 3, (2) using the CVX platform [39] and the reversewaterfilling solution of Theorem 5 using Algorithm 2 (because the input data in (73) satisfy the strong structural properties of Proposition 3, (i)). We compare the optimal sum-rate with the closed-form solution of (77). We observe that the latter is in general highly suboptimal with respect to the optimal numerical solution with the maximum rate-loss (RL), which for this example is approximately 1.05 bits/vector source, to be observed at moderate to low rates. A good performance of (77) in the sense that it almost coincides with the exact optimal solution can be observed at very high rates. This means that Corollary 3 and Theorem 5 that allow non-uniform distortion allocation may achieve significant performance gains compared to (77) that only allows uniform distortion allocation.

VII. CONCLUSIONS AND ONGOING RESEARCH
In this paper we revisited the problem of characterizing and computing the indirect NRDF for partially observable multivariate Gauss-Markov processes with hard MSE distortion constraints. We derived the complete characterization and the corresponding optimal test channel realization and gave conditions to ensure existence of solution of the characterization in both finite and infinite time horizon. Moreover, we obtained optimal numerical and closed form solutions for vector and scalar systems under either average total or pointwise MSE distortion constraints. One particularly interested result is the construction of new scalable optimal iterative schemes for time-varying scalar processes and time-invariant multidimensional processes that operate much faster than the standard semidefinite programming algorithms.
One particular question that we do not address herein but can be further analyzed from our results, is the relaxation of the Gaussian noise process that drives the state of the system model in (1), (2) to positive semidefinite covariance matrices. Another important question is the extension of Theorem 5 to time-varying processes which will require strong time-varying structural properties in the spirit of Proposition 3. Finally, the extension of this problem to controlled processes is also of major importance.

APPENDIX A PROOF OF THEOREM 2
(1) Under the conditions of the theorem we ensure that there exists an optimal solution for the general characterization of Theorem 1. Now the objective function in (43) can be reformulated as follows: Note that the first term in (78) is given whereas the terminal time is decoupled of all previous time instants and can be optimized separately. Under the assumption that A t is full rank, the additive time-varying term can be reformulated as follows where (a) follows from Weinstein-Aronszajn identity [ with the additional LMI constraint 0 ≺ Γ 1 t=0 log abs (|A t |). Note that the equality constraint Σ ξ n|n = Γ 1 n in (80) follows because when t = n at the objective function we only optimize with respect to − log |Σ n|n | which has been decoupled from the previous time instants therefore the use of the additional "slack" variable Γ 1 n is not needed and we simply take the equality constraint. Now using Woodbury matrix identity [35,Theorem 18.2.8] in the additional LMI constraint we obtain where Σ ξ t+1|t = AΣ ξ t|t A T + B t+1 B T t+1 , which is equivalent (as a Schur complement) to the constraint block matrix in (46).
(2) This follows similar to (1) hence we omit it. This completes the derivation.
APPENDIX B PROOF OF THEOREM 3 To solve the problem, we employ Karush-Kuhn-Tucker (KKT) conditions [40,Chapter 5.5.3] which are for the convex program in (49) necessary and sufficient conditions for global optimality. Similar to the proof of Theorem 2, we reformulate the objective in (49) to decouple the dependency of {σ 2 ξ t|t : t ∈ N n 0 }, from previous {σ 2 ξ t|t : t ∈ N n 0 } at each instant of time: Introduce the augmented Lagrangian as follows: J({σ 2 ξ t|t , λ t , φ t } n t=0 , θ) = n t=0 σ 2 ξ t|t ≥ (n + 1)(D − D min [0,n] ), σ 2 ξ t|t ≥ 0, σ 2 ξ t|t−1 ≥ σ 2 ξ t|t , where (84) is the first order partial derivative test; (85), (86) are the complementary slackness conditions; (87) are the primal feasibility conditions and (88) are the dual feasibility conditions. Next, we check the conditions. First observe that, by definition, λ t = λ * t = 0 because σ 2 ξ t|t > 0, ∀t. Moreover φ t = φ * t = 0, ∀t, because only then we have positive rates. In other words, if for some t φ * t > 0, then the rate is zero and it can be excluded from the optimal solution of the total rates. Furthermore, θ = θ * > 0 because by the convexity of the problem and the complementary slackness conditions (85) the distortion constraint holds with equality (the solution occurs on the boundary). Next, we solve (84)  where (a) follows because Σ ξ t|t−1 = AΣ ξ t−1|t−1 A T +Σ t ; (b) follows because we restrict the numerator and denominator in (91) to be have a time invariant value (because we impose the optimal minimizer to be time invariant and the corresponding output distribution to be time-invariant with a unique invariant distribution). Note that Π ξ is given by (61) and {Σ n : n ∈ N 0 } is a convergent sequence (by the conditions of the theorem) and its steady-state (time invariant) solution isΣ = lim t−→∞ Σ n . The constraint set in (60) is obtained because via Remark 5 we ensure a finite solution to the optimization problem if we impose the strict LMI 0 ≺ Σ ξ Π ξ which implies that Σ ξ 0 and Π ξ 0. From the conditions of the theorem, we have a convergent sequence {Σ n : n ∈ N 0 }, i.e., lim n−→∞ Σ n = Σ which further means that {trace(Σ n ) : n ∈ N 0 } is also convergent. This in turn implies that 1 n+1 n t=0 trace(Σ t ) = trace(Σ) as n −→ ∞ which is precisely (62). This completes the characterization of (60). The optimal time-invariant test channel realization (63) follows easily from the conditions of the theorem. This completes the derivation.