The Role of Fidelity in Goal-Oriented Semantic Communication: A Rate Distortion Approach

We study a variant of a robust description source coding framework, which is a relevant model for goal-oriented semantic information transmission, via its corresponding characterization. Considering two individual single-letter separable distortion constraints and input and output data acting as the intrinsic and extrinsic message, respectively, we first derive a lower bound on the optimal rates of the problem, as well as necessary and sufficient conditions for this bound to be tight. Subsequently, we prove a general result that provides in parametric form the optimal solution of the characterization of this problem. Capitalizing on these results, we examine the structure of the solution for one case study of general binary alphabets under Hamming distortions and solve in closed form a special case. We also solve another general binary alphabet case where a Hamming and an erasure distortion coexist, as a means to highlight the importance of selecting the type of the distortion constraint in goal-oriented semantic communication. Furthermore, we develop a goal-oriented Blahut-Arimoto (BA) algorithm, which can be used for the computation of any finite alphabet intrinsic or extrinsic message under individual distortion criteria. Finally, we revisit the problem for multidimensional independent and identically distributed ( $\mathop {\mathrm {i.i.d.}}$ ) jointly Gaussian processes with individual mean-square error (MSE) distortion constraints, providing new insights that have previously been overlooked. This work reveals the cardinal role of context-dependent fidelity criteria in goal-oriented semantic communication.

Abstract-We study a variant of a robust description source coding framework, which is a relevant model for goal-oriented semantic information transmission, via its corresponding characterization. Considering two individual single-letter separable distortion constraints and input and output data acting as the intrinsic and extrinsic message, respectively, we first derive a lower bound on the optimal rates of the problem, as well as necessary and sufficient conditions for this bound to be tight. Subsequently, we prove a general result that provides in parametric form the optimal solution of the characterization of this problem. Capitalizing on these results, we examine the structure of the solution for one case study of general binary alphabets under Hamming distortions and solve in closed form a special case. We also solve another general binary alphabet case where a Hamming and an erasure distortion coexist, as a means to highlight the importance of selecting the type of the distortion constraint in goal-oriented semantic communication. Furthermore, we develop a goal-oriented Blahut-Arimoto (BA) algorithm, which can be used for the computation of any finite alphabet intrinsic or extrinsic message under individual distortion criteria. Finally, we revisit the problem for multidimensional independent and identically distributed (i.i.d.) jointly Gaussian processes with individual mean-square error (MSE) distortion constraints, providing new insights that have previously been overlooked. This work reveals the cardinal role of context-dependent fidelity criteria in goal-oriented semantic communication.

I. INTRODUCTION
S HANNON, in his seminal work [2], has deliberately considered the semantic aspects and the effectiveness of transmitted messages as irrelevant to the communication problem [3]. Setting aside an issue which is otherwise confusing, this dichotomy between information content and its significance has been instrumental in achieving reliability and efficiency in information transmission over noisy channels. Nevertheless, in [4], Shannon has indirectly provided a means to study semantic information sources because the coding aspect determined by the probabilistic model of the source is dictated by a distortion constraint imposed in the system. Various endeavors have been made to incorporate semantics into Shannon's communication theory. Letting aside epistemic and doxastic logic theories, the main efforts include probabilistic logic approaches [5], [6], [7], [8], complexity theory approaches [9], and semantic coding and communication games [10], [11]. The effectiveness problem has been considered using the concepts of pragmatic information [12] and value of information [13], [14], [15]. Nevertheless, Shannon's communication model has remained virtually unchallenged. None of the proposed extensions has ever been recognized as a general theory of semantic or pragmatic information. The aforementioned theories have remained at a conceptual level, failing to have any tangible practical applications to or impact on communication networks. The quest for a goal-oriented semantic communication theory has recently gained new impetus [16], [17], [18], [19], fueled by the emergence of networks of autonomous agents with advanced sensing, learning, and decision-making capabilities. In this work, leveraging rate-distortion theory, we consider the problem of communicating a memoryless source, whose semantic, remote or intrinsic information is not directly observable, and is obtained based on noisy observations. Our objective is to investigate the impact of context-dependent fidelity criteria and distortion measures on goal-oriented information transmission and semantic source reconstruction. For that, we revisit a lossy compression framework, recently introduced in [20] and [21], considering both finite and continuous alphabets (i.e., i.i.d. Gaussian random variables (RVs)), and we study the effect of multiple individual distortion criteria in goal-oriented semantic information transmission. The objective of this work is twofold. First, we aim at complementing and extending the work in [21], which only considers continuous alphabet sources and mean-square error (MSE) distortion criteria, providing results and new insights that have been overlooked in this prior work. Second, we further highlight the role of the context-dependent fidelity criteria in goal-oriented semantic communication by showing cases with new outcomes that do not appear through the analysis of [20] and [21]. and optimal closed-form expressions for i.i.d. scalar-valued Gaussian processes. The same authors in [21] relate the  proposed system model into practical application examples  where semantic coding can be considered, and proceeded  to derive suboptimal characterizations assuming linear based  state-observation models driven by Gaussian noise observations and provide ways to numerically compute their problem for vector RVs. It should be noted that the rate distortion framework considered here and in [20] and [21] can be seen as a generalization of the robust description problem for two individual distortion criteria, which in turn is a special case of the two description coding problem [25]. Rate distortion with two individual distortion criteria has been studied in many papers under various contexts, see, e.g., [28], [29], [30], [31].
Another relevant yet different setup is the recently introduced rate-distortion-perception framework, see, e.g., [32], [33] (and references therein), in which perception quality, measured by some divergence between distributions, is included in addition to the classical distortion criterion. One major difference between rate-distortion-perception representations and the setup here is that in the former the characterizations are solved for various examples using separately each fidelity constraint, whereas in the latter, one can study from an optimization standpoint the joint behavior of the two distortion penalties.
One of our results herein concerns the computation of the information rates of the studied generalized robust description problem for discrete alphabets. To do it, we employ an alternating minimization approach reminiscent of [34], which results into a generalization of the celebrated BA algorithm [35], [36]. The use of the alternating minimization approach is desirable in convex programming problems as it converges globally to an optimal point [34]. Moreover, for similar problems with multiple fidelity constraints (even beyond the class of single-letter distortions), this methodology can lead to algorithms (i.e., non-trivial extensions of the BA algorithms) with provable convergence guarantees. We note that the standard BA algorithm is guaranteed to converge depending on the source input either exponentially or of order O(1/k), where k is the number of iterations [35], [37]. When it comes to the study of alternating minimization or maximization approaches that end up in constructing variants of the standard BA algorithm, the literature is vast. In particular, the BA algorithm can find applications in computing information theoretic characterizations, see e.g., [38], [39], [40], [41], in computing quantum channels [42], [43], in encapsulating machine learning techniques [44], [45], [46] and in neuroscience [47].

B. Contributions
In this paper, we consider a variation of the robust lossy source coding model which captures goal-oriented semantic attributes and intrinsic representation of information (e.g., features, structural/qualitative properties, embedding). By semantic information source, we mean some intrinsic information (i.e., "feature") embedded in the sensed extrinsic observation (i.e., "appearance"). Hence, for such a "semantic" information source, the task in coding is to encode the extrinsic observation whereas the decoder acts as a goal-oriented mechanism that is primarily interested to recover both the intrinsic and extrinsic information, but depending on the individual fidelity constraints it may alter its decoding preference. For this setup and its corresponding characterization (see Lemma 1, eq. (6)), we derive the following new results.
• We obtain a lower bound on the semantic rate distortion function (cf. (6)) and identify necessary and sufficient conditions based on which this bound is tight (see Lemma 2). • We prove a general theorem, which gives parametrically the implicit solution of (6) for arbitrary finite alphabet sets with individual semantic and observable distortion criteria (see Theorem 1). • We develop a goal-oriented Blahut-Arimoto algorithm (see Algorithm 1) by generalizing the classical alternating minimization procedure. This algorithmic approach allows the optimal computation of (6) for any finite alphabet set of intrinsic or extrinsic messages with individual single-letter distortion criteria with provable convergence guarantees. • We derive a new upper bound for i.i.d. continuous alphabets sources (see Theorem 6) and a novel lower bound for multidimensional jointly Gaussian RVs with individual MSE distortion constraints (see Theorem 7). Our results for i.i.d. jointly Gaussian processes, exemplify the utility of our lower bound in Lemma 2. The aforementioned results are not the sole contributions of this paper. Lemma 2 and Theorem 1 are applied into two examples (see Problems 1, 2) using specific setups with general binary alphabets and two types of distortion measures, namely Hamming and erasure distortions. For Problem 1, we derive structural properties on the optimal minimizer (test channel) consistent with Lemma 2 and characterize its solution (see Theorem 2). We enhance this result by solving in closed form a special case to illustrate the rate distortion surface of the problem (see Example 1). For Problem 2, we characterize and solve in closed form the solution (see Theorem  3). An interesting observation that stems from Theorem 3 is that depending on the distortion constraint, we can make the system choose which source (i.e., semantic or observation) to transmit. Simply put, in goal-oriented semantic communication, selecting the type of individual distortion measures or context-dependent fidelity criteria according to the application/task requirements can significantly affect the remote reconstruction of the semantic source.

II. PROBLEM STATEMENT AND NEW BOUNDS
In this section, we consider a goal-oriented semantic compression problem. Specifically, aligned with many emerging applications, such as machine-type communications and networked intelligent systems, we consider a scenario where the receiver may not be interested in the source sequence, but only in extracting a certain feature of it. For example, instead of reproducing an image, the receiver may be interested in certain statistical aspects of the image, or the presence or absence of certain objects or people in the image. This can model a goal-oriented compression scenario, in the sense that by reconstructing the desired feature (semantic information) can represent a specific goal.
We consider a memoryless source described by the tuple (x, z) with probability distribution p(x, z) in the product alphabet space X × Z. The semantic or intrinsic information of the source is in x, which is not directly observable, whereas z is the noisy observation of the source at the encoder side. Our objective is to study how the distortion penalties can affect goal-oriented information transmission and source reconstruction using lossy compression.
Formally, the system model (without the distortion penalties) is illustrated in Fig. 1 and can be interpreted as follows. An information source is a sequence of n-length i.i.d. RVs (x n , z n ). In this setup, we assume that we are given the sequence of n-length i.i.d. RVs x n that induce the probability distribution p(x) whereas the observable sequence of n-length i.i.d. RVs z n is obtained by knowing the transition probability distribution p(z|x). The noisy observations z n are then received by an encoder (E) that generates the index f E (z n ) ∈ W, whereas at the decoder (D), depending on the distortion penalty, the system can either reconstruct an estimate of both input RVs, i.e., ( x n , z n ) or of each one individually. It is exactly that specific "selective mechanism" of the decoder that creates -from an inference perspective -a goal-oriented semantic information processing. The encoder and decoder are modeled by the mappings where the index set W = {1, 2, . . . , M }, with M being a positive integer and (g D o , g D s ) denote the observations and semantic information decoder, respectively. We consider two per-letter distortion measures responsible to penalize the semantic and observations information source in Fig. 1, given by d s : X × X → [0, ∞) and d o : Z × Z → [0, ∞), respectively, and their corresponding expected per-symbol distortions by The encoding and decoding is done in blocks of length n and the fidelity criterion for the semantic and observable information is the pair of average distortions defined as Next, we give the definitions of the achievable rates and the infimum of all achievable rates.
In the following sections, we demonstrate the impact of the fidelity criterion in a remote source coding problem with individual distortion measures.

A. Characterization of the Operational Rates
The information theoretic characterization of (5) is given by the following lemma.
Lemma 1: For a given p(x) and p(z|x), the semantic rate distortion function (SRDF) of the setup in Fig. 1 is characterized as follows with I(p, q) demonstrating the functional dependence of the mutual information on {p(z), q( z, x|z)}. A detailed proof of Lemma 1 is omitted because the achievability part follows from a special case of the achievability proof of the multiple description source coding problem, called robust description [25,Theorem 2], in view of the fact that one can modify the indirect to a direct rate distortion function (RDF) formulation using an amended version for the semantic distortion constraint, i.e., E [d s (x n , x n )] = E d s (z n , x n ) (see, e.g., [20,Theorem 1] and references therein). For the converse part, a complete proof is provided in [1, Lemma 1].
We conclude this subsection with certain functional and topological properties of (6).
Remark 1: The following functional properties of SRDF can be obtained using standard arguments that stem from classical rate distortion theory, see, e.g., [26].
. We conclude this remark by pointing out that the constrained set in (6) is compact (for both finite or abstract alphabets) and the objective function in (6) is lower semi-continuous with respect to q( z, x|z). As a result, from Weierstrass extreme value theorem, we know that the infimum is attained by a q * ( z, x|z) and we can formally replace it with minimum in the sequel.

B. A New Lower Bound and Conditions for Its Tightness
In what follows, we derive a new bound on (6), as well as information structures (i.e., conditional independence constraints) that allow for this bound to be tight. The utility of this bound is twofold: first, it shows the best rates that (6) can possibly achieve; second, it can be utilized in the computation or the derivation of closed form expressions in cases at which (6) cannot be optimally obtained, i.e., i.i.d. non-Gaussian sources.
Lemma 2: The optimization problem in (6) admits the following lower bound: where (R(D o ), R(D s )) represent the standard direct and indirect RDFs obtained via their individual distortion criteria, i.e., (8) corresponds to the best possibly achievable rates because (6) cannot be lower than the best rate achieved in either less constrained problem (individual rate distortion problems in (9)). Next, we derive conditional independent constraints (i.e., information structures) which allow the lower bound on (6) to be tight. Recall that by the chain rule of mutual information (7) (see, e.g., [27]), we have I(z; z, x) = I(z; x) + I(z; z| x) = I(z; z) + I(z; x| z). (11) From (11), we obtain where (a) follows from the fact that I(z; z| x) ≥ 0 and (b) follows from the fact that I(z; x| z) ≥ 0. Clearly the bounds in (12) are tight iff for inequality (a) I * (z; z| x) = 0, i.e., the Markov chain z − x − z holds and for inequality (b) I * (z; x| z) = 0, i.e., the Markov chain z− z− x holds. In view of the previous simple observation we arrive to the following bounds on (6).
holds with equality iff the condition of inequality (12), (a) holds.
holds with equality iff the condition of inequality (12), (b) holds.
If both Case 1 and Case 2 are concurrently true, i.e., (10) holds, then, from Lagrange duality theorem [48], we can write the individual unconstrained dual problems for R(D s ) and R(D o ) associated with their corresponding Lagrangian multipliers, say (s 1 , s 2 ), and choose the Lagrangian multiplier that corresponds to the maximal rates between R(D s ) and R(D o ) which means precisely R L (D o , D s ) in (8). This completes the proof.

III. RESULTS FOR FINITE ALPHABETS
In this section, we provide our new results for finite alphabets sets. Before giving our first result, we note that the constrained problem in Lemma 1 can be written as an unconstrained problem via the Lagrange duality theorem [48] as follows where s 1 ≤ 0 and s 2 ≤ 0 are the Lagrange multipliers.
In view of (13) we can prove the following general result (it holds for general sources as well).
Theorem 1: Suppose that p(x) and p(z|x) are given. Then, the following parametric solutions for (6) may appear.
(i) If s 1 < 0 and s 2 < 0, the implicit optimal form of the minimizer that achieves the minimum in (6) is where (s 1 , s 2 ) are the Lagrange multipliers associated with the individual distortion penalties and ν * ( z, x) = z q * ( z, x|z)p(z) is the Z × X -marginal of the output process ( z n , x n ). Moreover, the optimal parametric solution of (6) when R(D * s , D * 0 ) > 0 is given by and where D * s is given by (16) and .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where D * s is given by (17) and Theorem 1 is pivotal as it can be used in various ways including the derivation of analytical expressions of (6) or for the construction of generalizations of the BA algorithm [49], which can find parametrically the optimal solution of (6) for arbitrary finite alphabet sets and general bounded distortion functions. In the sequel we also study these directions.

A. Binary Alphabets With Individual Hamming Distortions
In what follows, we utilize both Theorem 1 and Lemma 2 to study the case of binary alphabets, i.e., X = Z = X = Z = {0, 1} with individual probability of error distortion penalties.
Problem 1: Suppose in the setup of Fig. 1, the remote source x and the noisy channel of z given x are modeled as follows If we assume that the model of the noisy channel in (20) becomes "deterministic", i.e., β = 1 and γ = 0, then, it can easily be shown that z = x and the problem recovers the robust description setup for binary alphabets studied in [25,Section VII].
We provide now a major result of this paper. Theorem 2: Consider the setup in Fig. 1 restricted to the given data of Problem 1. Then, the following hold: (i) the necessary and sufficient conditions in (10) hold; The general result of Theorem 2 shows that the rate-splitting bound in (8) is achievable for the specific class of input data under the probability of error distortions assumed in Problem 1.
In what follows, we study the structural solution derived in Theorem 2 in an application example.
holds, then we need to compute the RDFs in (9). For the direct rate distortion problem with binary sources, it is relatively easy to see that the closed form solution will be a straightforward generalization of the analytical solution  (47) and H b (·) denotes the binary entropy function. We stress that an optimal closed form solution of the binary indirect RDF (9) is not known, in general, and only bounds exist in the literature, see, e.g., [50]. Nevertheless, one can always use straightforward generalizations of the classical BA iterative schemes to numerically compute the optimal solution.
In the sequel, we first provide an example where the semantic message x is modeled by an i.i.d. binary (Bernoulli) source whereas the transition matrix p(z|x) is doubly stochastic.
Based on the solution of R(D * o , D * s ) we observe an interesting interplay between (c, D o , D s ) regarding the choice of the maximum achievable rates. In particular, it appears that if D o > Ds−c 1−2c , then the system benefits more by encoding subject to a Hamming distortion only the semantic information, therefore the rate is R(D * s ); whereas if D o < Ds−c 1−2c the system benefits more by encoding subject to its distortion the observable message of the source with rates , then by encoding either the semantic information or the observations does not offer any advantage for any value of the active distortion region.
Next, we study an extreme scenario to highlight the importance of the distortion measure in the transmission (or not) of the semantic message. To do it, we consider two different individual distortion constraints (i.e., a standard erasure distortion [27,Exercise 10.7] and a Hamming distortion) to distinguish from Problem 1 where we have identical types of distortion constraints.
Problem 2: Suppose that in Problem 1, the semantic distortion d s (x, x) is replaced by the standard erasure distortion as follows Based on the given data of Problem 2, we derive the following solution for the SRDF characterization.
Theorem 3: Consider the setup in Fig. 1 restricted to Problem 2. Then, for the choice of the semantic distortion penalty in (24), the characterization in (6) which can be explicitly computed via (21).
Proof: See Appendix C. Interestingly, the choice of the erasure distortion measure in Theorem 3 demonstrates that the amended distortion of the semantic (remote) source allows only the erasures to be sent, which in turn results into the zero rate of the indirect rate distortion problem. This result comes as a rather extreme case of the general result of Theorem 2 and demonstrates the cardinal role of the distortion penalties into the solution.

B. Goal-Oriented Blahut-Arimoto Algorithm
Next, we propose a generalization of the celebrated BA algorithm to treat the case of arbitrary finite alphabet sets with individual distortions.
First we restate an equivalent way to arrive to the parametric solution of Theorem 1 using the alternating minimization approach [49] instead. This result and the subsequent analysis form the basis of the proposed goal-oriented BA algorithm.
Proof: (i) Due to the convexity and monotonicity of the optimization problem in (6) (see Remark 1), we can reformulate it as an unconstrained problem as follows where the expectation operator E[·] is taken with respect to the joint distribution p(z, z, x). Using (26), the double minimization in (25) where the minimization is over an arbitrary chosen output marginal distributionν( z, x) defined on Z × X and the minimization over ν( z, x) follows from the condition for equality in [36, Theorem 5.2.6]. (ii), (iii) To optimize, we use Karush-Kuhn-Tucker (KKT) conditions similar to the ones utilized for the derivation of Theorem 1 hence we omit it. In Lemma 3 we consider the Lagrange multipliers (s 1 , s 2 ) to be non-positive. One can easily obtain all the special cases discussed in Theorem 1 by choosing to have only one or none of the distortion constraints active. Clearly, if we consider optimizing jointly with respect to {q( z, x|z), ν( z, x)} in (25), then, the result of Lemma 3, will coincide to the general result of Theorem 1.
Next, we give two corollaries that are instrumental in the development of our algorithm because they give the implicit solution of {q( z, x|z), ν( z, x)} parameterized by (s 1 , s 2 ).
We continue our analysis with a general theorem reminiscent of the one derived for the classical BA algorithm. This result demonstrated the convergence of the proposed algorithm to an optimal limit point. In the following theorem we denote p = p(z), ν ≜ ν( z, x), and q = q( z, x|z) the probability vector of those distributions, e.g., for Z = {0, 1, . . . , M 1 }, M 1 ∈ Z + , p = (p(z = 0), . . . , p(z = M 1 )).
Theorem 4 (Alternating Minimization): Let the parameters s 1 ≤ 0 and s 2 ≤ 0 be given and denote A(z, z, x) = e s1 ds(z, x)+s2do(z, z) and k ≥ 1 the number of iterations. Let any ν (0) be any initial marginal probability distribution on X × Z (given as a probability vector) with all the components to be nonzero. Let ν (k+1) be given in terms of ν (k) as follows Then, Proof: The proof is omitted as it follows similar steps to the one derived in [36,Theorem 6.3.8].
In order to develop an algorithm to compute (6) for arbitrary finite alphabets sets, we also need a stopping criterion, for which a generalization of [36, Theorem 6.3.9] is required. To find one such termination criterion, we need the following lemma.
The next theorem gives bounds on the RDF, which allow to estimate in the algorithm the residual error at each iteration. This theorem serves as a stopping criterion for our algorithm.
Theorem 5: Let the parameters s 1 ≤ 0, s 2 ≤ 0 be given and let A(z, z, x) = e s1 ds(z, x)+s2do(z, z) . Suppose that ν is any output probability vector and let Then at the points Proof: The proof follows similar steps to [36, Theorem 6.3.10] hence we omit it.
We are now ready to give the goal-oriented BA algorithm for the setup in Fig. 1. This is implemented in Algorithm 1, which is clearly a generalization of the noted BA algorithm [49]. Choosing appropriately the value of the Lagrangians (s 1 , s 2 ), one can recover the classical BA algorithm or its extension to the indirect rate distortion problem.
Before we conclude this section, we would like to remark that the analysis of the goal-oriented BA algorithm that requires two distortion fidelity metrics should not be confused with the analysis given in [49, pp. 469-471, Theorem 14]. Therein, the author studies a direct rate distortion problem of the form where d 1 (z, z), d 2 (z, z) are the individual distortion fidelity constraints.
The results in this section provide only a starting point to the class of problems that deal with multiple fidelity constraints. For instance, the alternating minimization procedure, which is the basis of our goal-oriented BA algorithm, can be used for any finite number of additional fidelity constraints and for more general classes of distortion fidelities, beyond the ones considered in this paper.

IV. RESULTS FOR CONTINUOUS ALPHABETS
In this section, we study the setup of Section II assuming i.i.d. continuous alphabet sources and emphasizing on Gaussian alphabets. Subsequently, we discuss the connections between our results and [21], showing that despite the seeming similarity, the analyses are in fact fundamentally different.
Suppose that (x, z) are zero mean random vectors such that x ∈ R p1 , z ∈ R p2 , (p 1 ≤ p 2 ). Moreover, assume that x ∼ (0; Σ x ), Σ x ≻ 0 (that is, x is not necessarily Gaussian) and p(z|x) is modeled by the linear realization of the form where A ∈ R p2×p1 is full column rank, and s ∼ N (0; Σ s ), Σ s ⪰ 0, with s being independent of x. Moreover, assume that (2), (3) are chosen as and d s (x i , x i ) = ||x i − x i || 2 R p 1 (or the quadratic Euclidean norm). For the given p(x) and p(z|x), using the linear MSE estimation theory we can approximate the conditional distribution p(x|z) ∼ (E[x|z]; Σ x|z ), with conditional mean and conditional covariance respectively, given by where (·) † denotes the pseudoinverse of a matrix. 2 Note that in (34) In view of the above, we now proceed to our first main result.
Theorem 6: Suppose that Σ † z = Σ −1 z . Then, the following bound holds where det(·) is the determinant of a matrix, H = Σ x A T Σ −1 z , and the conditional covariance defined by T is a design variable. Moreover, (a) holds with equality if (x, z, x, z) are jointly Gaussian random vectors and one can find the explicit information structure of the realization for the conditionally Gaussian distribution q( x, z|z) that achieves the minimum in the RHS of (35).
Proof: The upper bound on R(D o , D s ) is immediate by the fact that we chose a specific approximation for p(x|z) and the corresponding p(z) for non-Gaussian p(x). Hence, the realization of the optimal minimizer q * ( x, z|z) that achieves R(D * o , D * s ) is clearly suboptimal and therefore we have the upper bound in (35). Now (a) in (35) holds with equality if the joint variables (x, z, x, z) induce a jointly Gaussian distribution, and one can find the optimal realization (or forward test-channel) of the non-degenerated conditionally Gaussian distribution q * ( x, z|z) that achieves the specific objective function in (35). The two distortion constraints are obtained as follows. From MSE estimation theory, we have E[||z − Similarly, for the second constraint, we use the assumption that where (ii) follows from the orthogonal projection theorem; (iii) follows if x = E[ξ| x, z]; (iv) follows from the approximation in (34) assuming Σ † z = Σ −1 z . This completes the proof.
It should be noted that even if in Theorem 6 one assumes jointly Gaussian random vectors, it is very challenging to find the optimal realization of the conditionally Gaussian distribution q * ( x, z|z) that allows the upper bound to hold with equality. This is because of the asymmetric nature of the observable data (input) and the reconstructed data (output). For this reason, in the sequel, we derive a novel lower bound for jointly Gaussian i.i.d. processes leveraging the result of Lemma 2.
Theorem 7: Suppose that the semantic source is x ∼ N (0; Σ x ), the observations is described by (33) and (x, z, z, x) are jointly Gaussian random vectors. Then, for ξ = E[x|z] = (34) with Σ † z = Σ −1 z , the following lower bound holds where Σ ξ ≜ E ξξ T , and the design variables Moreover, (39) holds with equality iff I( x; z|f (z)) = 0 which is the case The second lemma is new and provides necessary and sufficient conditions to ensure that (39) holds with equality for zero mean jointly Gaussian random variables.
Lemma 6: Assume that x ∼ N (0; Σ x ) and (x, z, x, z) are jointly Gaussian random vectors. Moreover, let E[x|z] be given by (34). Then, (39) holds with equality if Σ x A T (AΣ x A T +Σ s ) † forms a square, full rank matrix. This in true iff A ∈ R p2×p1 in (33) is square, i.e., p 1 = p 2 , and full rank.
Proof: Observe that the following series of equalities hold where (a) follows from Lemma 5 if we set f (z) ≡ E[x|z]; (b) follows from (34); (c) follows from Lemma 5 if Σ x A T (AΣ x A T + Σ s ) † z is an invertible map which is the case iff Σ x A T (AΣ x A T + Σ s ) † is square and full rank matrix. From properties of the rank of matrices (see, e.g., [51,Corollary 8.3.3]) this holds iff A is square and full rank. This completes the proof. Using Lemmas 5, 6, we obtain the following bound on where (⋆) follows from Lemma 5 with ξ = E[x|z] and holds with equality if Lemma 6 holds; (⋆⋆) follows from the orthogonality principle similar to the proof of Theorem 6, the maximum entropy principle applied on the definition of mutual information and the fact that one can show that the optimal minimizer (or forward test-channel) q * ( x|ξ) can be realized as is an identity matrix. Similarly one can show that R G (D o ) = (38) since from the direct Gaussian RDF (see, e.g., [26]) it is well known that the optimal minimizer q * ( z|z) is z = U z + v 2 , such that U = I p2×p2 − Σ z| z Σ −1 z , Σ v2 = Σ z| z U T . This completes the proof.
Next, we provide examples where we compute the bounds derived in Theorems 6, 7 for i.i.d. jointly Gaussian processes.
Example 2: Suppose that p(x) ∼ N (0; Σ x ) and p(z|x) ∼ N (Ax; Σ s ), with p 1 = p 2 = 2, are randomly chosen such that  Table I. The numerical results of Table I illustrate    where A is full column rank. Moreover, using (33) and (34), we can compute the positive definite matrices (Σ z , Σ ξ , Σ x|z ) and the real matrix H as follows Similar to Example 2, we can compute the bounds in (35) and (36), respectively, for D o > 0 and D s > D min s = trace(Σ x|z ) = 0.1921 (for this example). Some indicative numerical results are presented in Table II. These once again illustrate that there are cases where for certain values of the pair (D o , D s ), the lower bound in Theorem 7 is tight and corresponds to the optimal value of R G (D o , D s ). Interestingly, as opposed to Example 2, the lower bound is only tight because R G (D o ) is in some cases achieved and never because of R G (D s ), which as expected from Theorem 7 is achieved if p 1 = p 2 . In other words, the RHS quantity of the indirect Gaussian RDF in (37) is strictly less that R G (D s ).
The previous simulation studies further emphasize on the cardinal role of context-dependent distortion constraints beyond finite alphabets. For example, a key takeaway in Example 3 is that depending on the choice of the pair (D o , D s ), the system may prefer to only reconstruct the observations signal considering the reconstruction of semantic information as irrelevant.
Remark 3: (i) The assumptions made in [21, Section IV] and the resulting characterization (42) should be read with caution because they do not correspond to the problem statement of Section II or, equivalently, to the problem formulation of [21, Fig. 1]. Indeed, according to their (and our) problem statement, the system model presupposes that p(x) and p(z|x) are given and one wants to design p(x|z) and p(z), respectively. The assumptions and approach put forward in [21] would make sense if we were given a joint probability distribution p(x, z) = p(x|z)p(z), but that already corresponds to a different problem formulation compared to the one in Section II. (ii) Even if one accepts that the assumptions in [21,Section IV] are consistent with their original problem formulation, it is not clear if the characterization in (42) corresponds to the exact solution of R(D o , D s ) because the authors in their derivation did not provide a meaningful realization of the corresponding test-channel (optimal minimizer) that achieves the optimization problem for i.i.d. non-Gaussian processes. (iii) In [21, Section IV], it is assumed that p 2 ̸ = p 1 , however, because x ∈ R p1 represents the semantic meaning of a source, it is natural the observations z ∈ R p2 to be defined on a larger space, hence one should consider á priori that p 1 ≤ p 2 . (iv) For scalar i.i.d. Gaussian processes, the result in [20, Proposition 1] appears to be correct and indirectly implies the tightness of the lower bound derived in Lemma 2.

V. CONCLUSION AND ONGOING RESEARCH
A variant of a robust description source coding problem with two individual criteria, which is a relevant model for goal-oriented semantic communication, was studied here. We derived a lower bound on the SRDF, as well as necessary and sufficient conditions for its tightness. We then proved a general theorem that provides parametrically the optimal solution of the characterization for this problem. Capitalizing on these results, we examined the structure of the solution for the case of general binary alphabets under Hamming and erasure distortion criteria. Following classical arguments, we also constructed a goal-oriented BA algorithm for the computation of any finite alphabet sets under individual distortion criteria. Finally, we studied bounds on the multidimensional jointly Gaussian random vectors with individual MSE distortion constraints. A key takeaway from our results is that the class of the fidelity criteria may significantly affect the system behavior irrespective of its task, hence it should be chosen appropriately.
Our ongoing research activity builds on two directions. First, constructing analytical examples where the presence of the additional distortion fidelity affects semantic communication. Second, we seek to generalize our goal-oriented communication setup beyond separable distortion criteria, since in real world applications one is more interested in relations between the input and output symbols that are highly nonlinear. The idea of using alternating minimization appears to be a promising methodology to study and compute such general classes of distortion fidelities.

APPENDIX A PROOF OF THEOREM 1
The fully unconstrained problem of (6) using (13) is as follows where s 1 ≤ 0, s 2 ≤ 0 are the Lagrangian multipliers associated with the individual distortion constraints E d s (z, x) ≤ D s and E [d o (z, z)] ≤ D o , respectively, whereas λ(z) ≥ 0 is associated with the equality constraint z, x q( z, x|z) = 1, and µ(z, z, x) ≥ 0 is responsible for the inequality constraint q( z, x|z) ≥ 0.

APPENDIX B PROOF OF THEOREM 2
Recall that the input data and the distortion functions are introduced in Problem 1. We first start with some preliminary calculations. In particular, using (20), we can obtain p(z) as follows: Using the fact that p(z, x) = p(z|x)p(x) we obtain Moreover, from (47), (48) and the fact that d s (z, x) = x∈X p(x|z)d s (x, x) (from the characterization in Lemma 1), we can obtain d s (z, x) as follows We can now proceed to prove (i).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.