Over-the-Air Computing With Imperfect CSI: Design and Performance Optimization

Over-the-air computing (AirComp) has recently attracted considerable attention as an efficient method of data fusion by integrating uncoded communication transmissions with computation thanks to the signal superposition offered by the multiple access channels. However, appropriate processing is required to neutralize the wireless channel effect. As, internet-of-things (IoT) applications through low-cost devices is the main target of AirComp, perfect availability of channel state information (CSI) is not always practical, there is the need to investigate the effect of imperfect CSI on AirComp. Specifically, we present novel closed-form expressions for tight approximations that can be used to design and evaluate AirComp systems. Furthermore, we design a general optimization framework that takes into account both magnitude and phase errors in the CSI. Finally, a pilot retransmission policy is designed, that offers trade-off between resources cost and the gain in the accuracy of the computations. In order to validate its application, a utility function of the cost of retransmission is introduced, namely, Retransmission Policy Cost (RPC), which can incorporate the power or throughput cost opposing to the expected gain of the selected policy. Simulations show the deterioration caused by the imperfect CSI and highlight the added value of the proposed policy under various system conditions.


I. INTRODUCTION
In 5G and beyond wireless networks, there is a paradigm shift from human to machine type communications.The former is characterized from a large amount of data requested in bursts by individual users, while the latter from a small amount of data, albeit continuously and from a huge number of devices.As a result, today's systems produce an enormous amount of data at extremely high rates [2].To deal with the massive amount of distributed data, wireless data aggregation (WDA) through over-the-air computing (Air-Comp) has recently emerged as an attractive technology [3]- [6].AirComp can achieve several goals in the network by exploiting the superposition property of transmitted waveforms over the multiple access channel (MAC), and appropriate preand post-processing to produce a family of functions, called nomographic such as arithmetic mean, geometric mean, etc.The pioneering works [7], [8] and [9] provide a detailed presentation on the computable nomographic functions over MACs.This exploitation can lead to a significant reduction in latency and computation load at the central processing node, especially when the number of sensors becomes too large.

A. Literature Review
As shown in [10], the uncoded AirComp is optimal in the presence of a Gaussian MAC with independent data sources.This optimality is expressed in terms of the mean squared error (MSE), which is a fundamental measure of the performance for the discussed system.In [11] and [12], the authors present a thorough analysis on the performance of analog computation over wireless MACs.The integration of AirComp with popular wireless technologies such as MIMO has received a lot of attention from the research community [13]- [17].In particular, the authors in [13] investigated the joint hybrid beamforming of a MIMO AirComp system in order to minimize the MSE in a cost-effective way.In [14], the aim, was to minimize the power consumption of the transmitting devises under the constraint of minimum required MSE.In [15] an AirComp system combined with MIMO was studied for the simultaneous calculation of different output target functions.In [16], the authors design a zero-forcing beamforming transmit scheme, they implement selection combining at the receiver side and track an MSE outage metric.Finally, in [17], the authors propose the study of an integrated sensing, communication, and computation over-the-air system with MIMO and they examine algorithms that allow the multiple functionalities of this system to coexist.In addition, reconfigurable intelligent surfaces (RIS) have also been examined as a way to facilitate AirComp with higher performance gains [18], [19].
Recently, AirComp has been discussed as an attractive technology to be implemented with federated learning in the network edge [20]- [30].The idea behind this is that the gradients, which are produced in distributed machines using local models, can be aggregated via AirComp on the base station of the network, thus updating the global model in an efficient way.A similar idea has been considered for AirComp as a means of training deep neural networks [31].In addition, AirComp has shown promise in a number of different scenarios, such as in combination with aerial networks with unmanned aerial vehicles (UAVs).More specifically, in [32], the authors optimize the trajectory of the UAV to minimize the time-average MSE, while in [33] they take into account CSI imperfections when aggregating data into a UAV fusion center.Another interesting technology that has been considered is the use of wireless powered transfer (WPT).Since WDA is a basic target for IoT sensor networks, WPT can assist on the energy efficiency and functionality of these type of networks, as studied in [34].An interesting application was studied in [35], where AirComp was proposed to compute the control signal of a controller.
AirComp systems face a special set of challenges that need to be resolved to ensure their uninterrupted and successful operation.Precise time synchronization of the transmitting devices is one of them, since each device must take into account its message's propagation delay.Moreover, carrier frequency offset (CFO) issues, which are usually solved by the use of high quality oscillators, need to be addressed with different means, since AirComp devices are usually required to be low-cost [20].On top of that, the transmission over the wireless channel is susceptible to path loss, fading, and noise.To account for this, a power allocation strategy is required, such as those presented in [3], [36], i.e., the optimal power allocation for the transmitting devices in order to achieve a minimum MSE under the assumption of common maximum power transmission.Finally, in [37] the same challenge was addressed when a total maximum power transmission constraint was imposed.The authors provided a closed-form solution and important insights into the operation of such an AirComp system.

B. Motivation and Contributions
Most of the aforementioned works focus on the perfect channel state information (CSI) scenario, which is not easily guaranteed in practice, when IoT devices are utilized.Imperfect CSI can highly deteriorate the AirComp system's performance, as channel information is crucial in obtaining the correct message at the receiver.In the literature, there have been some attempts to study the effect of imperfect CSI and its performance [33], [38].However, none of them look into the general case of imperfect CSI both in magnitude and phase.Also, none of the previous works has provided theoretical analysis of the performance of the policies and utility functions discussed to be used in more practical related applications of over-the-air computing systems.
More specifically, the main contributions of this paper can be summarized as follows: • First, we provide a theoretical analysis based on order statistics to gain more insights into how CSI imperfections affect the channel magnitude and phase.The presented analysis is based on the proposed policy of the perfect CSI case but the derived results can help identify the conditions necessary for an AirComp system to operate without suffering excessively from noisy CSI.
We also extract useful approximations that can be used to partially estimate the performance of this policy under imperfect CSI.
• Next, we study the problem of minimizing the MSE through power allocation at the transmitter and gain factor selection at the receiver.We then propose an algorithm that uses alternating optimization for the minimization problem, based on an approximation of the MSE at a worst case scenario.This approach differs from that in [38] mainly by the fact that it considers the phase difference between the real and the estimated channel, which is obviously the general case to be considered.Also in contrast to [38] no assumption of accurate phase estimation is made, which also leads to the study of inaccuracies caused by phase differences.For better comparison, performance analysis in terms of power consumption for the two policies, is also provided.• Finally, based on the theoretical analysis we propose a retransmission round for the weaker estimated channels, as an attempt to improve the system's performance.
We then address the problem of finding the optimal number of retransmissions with the aim to ensure the efficient use of the system's resources.For this purpose a utility function that considers the trade-off between MSE minimization and resource cost has been introduced and studied.Also a closed-form has been derived that allows to evaluate the mean value of the overhead, produed by the utilization of the extra retransmission round, which proves its usefulness since it takes into account practical details of an over-the-air system.
To our knowledge, this is the first time that a utility function has been used to further improve the performance of an over-the-air computing system besides solving an optimization problem.

C. Structure
The rest of this paper is organized in the following way: Section II describes the system model, the objective function to be minimized, and the CSI estimation procedure.In Section III we analyze the performance of the extra terms that arise from the use of the proposed policy under perfect CSI knowledge, due to the considered CSI imperfections.In Section IV we investigate the optimization of MSE and in Section IV-B we look at the use of the retransmission policy and its performance enhancement on the system.In Section V we present the simulation results along with extra comments and discussions.Finally, in Section VI we draw some conclusions from our work and note some future research opportunities that arise in this field.

II. SYSTEM MODEL
We consider an AirComp system consisting of one receiver that acts as a fusion center and multiple transmitting devices.Let K be the number of transmitting devices in the AirComp system, where each of them is independent.Assume that we wish to calculate a function f : R K → R of all the transmitted data, denoted as As shown in Fig. 1, function f can be precisely computed over an ideal MAC, which is the basic idea behind AirComp.
In the context of this paper and without loss of generality, we assume that the receiver and all transmitters are equipped with a single antenna.Next, we assume that the target function f is the arithmetic mean of the input data.Hence, the pre-and post-processing functions are linear on the data.We aim to calculate the com-putation distortion between the ideal signal r ideal = K k=1 x k and the received signal given by The distortion between the two signals y and y is given by the mean squared error of the two signals as Hence, (3) results into and substituting r and r ideal in ( 4) we obtain Furthermore, we assume that all signals and noise are independent with to each other and all signals We also assume that the noise n follows a complex Gaussian distribution with E[n 2 ] = σ 2 .Taking the expectation with respect to the signals x k and noise n gives and equivalently from the above assumptions In order to study possible imperfections in the CSI, we assume that the error in the channel estimation is modeled as an additive random variable n k .Hence are the noise samples in the k-th CSI estimation which are uncorrelated and independent and because of Rayleigh fading condition.
For the estimation of h k , pilot symbols of maximum transmitting power √ P , which are known a priori to both transmitter and receiver, are utilized.Since AirComp systems are mainly designed with IoT devices in mind, where low complexity is desired, the least square (LS) channel estimation is one of the most practical CSI methods to be used.Hence, the baseband equivalent at the receiver is y k = √ P h k + n k and so the receiver assumes that y k = √ P h ′ k .Consequently, the channel estimation is where e k ∼ CN (0, ) is the estimation error due to the presence of noise [39], [40].As seen from the CSI estimation model, we assume that estimation is made on both magnitude and phase of the channel.Different CSI models that estimate only the magnitude or statistical properties of the channel have not been considered because they would only worsen the system's performance due to less available knowledge on the receiver.Another reason for the chosen CSI estimation method is that because of the vast number of transmitting devices on over-the-air computing systems, insufficient CSI estimation will compromise all of them and as a result, there will be greater MSE distortion.Consequently, we can assume that the channel estimation h ′ k is distributed as III. AIRCOMP WITH IMPERFECT CSI Assuming the receiver is unaware of the CSI imperfections, the algorithm obtained for an AirComp system with perfect CSI in [3], [36] would lead to a combination of full power and channel inversion methods.Applying this algorithm in (7), would result in where i * is the critical number of transmitting devices that utilize their full power as given in [3], whereas the rest of the transmitting devices utilize the inverse channel method.Moreover, the ascending channel ordering has been assumed, without loss of generality, as Since the order has been taken with respect to the estimation of h k , it is clear that the order of the actual channel gains could be different, e.g., The number of transmitting devices i * est that use their full power can be calculated based on the imperfect CSI as where h ′ is the vector containing the estimated CSI and Due to the estimated magnitudes of the channel gain terms in (14), i * est ̸ = i * in general, resulting in suboptimal performance.On top of that, the main issue stemming from ( 14) is that the incorrect values of g i will also affect a, which, in general, is given as a = g i * .
Notice that, as thoroughly studied in [3], when the minimum MSE is achieved by the corresponding g i (h ′ ), it holds that This is important because it is a necessary condition that must hold from a feasibility point of view, since a should be such that at least the right hand side of ( 15) is satisfied for i * est = i * to be the critical number.
In this context, it is of vital importance to study the channel ordering, since the ones with the lowest gain are more prone to estimation imperfections, leading both ( 13) and ( 14) to miscalculations.The following lemma, combined with (17), provide an answer to that.
Lemma 1: The expected value of the magnitude of the ordered channel gains for the r-th ordered channel is given by Proof: The proof is provided in Appendix A. Moreover, the expected value of the magnitude of the error is given by [41] where ρ is the transmit SNR.In order to better illustrate the above results, Fig. 3 presents the mean value of the order channel gains for different K.
As it is evident from this figure and calculated from ( 16), ( 17) the channels with the lowest gain can be comparable to the magnitude of the estimation error.Therefore, the corresponding channels' estimation is highly affected, since it is the result of the complex addition of the true channel value and the error.Using ( 16) and ( 17), the number of the channels that are mostly affected by the CSI imperfection can be found, which is of great value when designing the AirComp system.However, when imperfect CSI is available there are more issues to take into account.The most important of them is that the order of the estimated channels is very likely to be different from the correct one, which leads to an additional error in the minimization of MSE.Ultimately, the most affected channels will also be more susceptible to greater phase difference during CSI.
In contrast to the perfect CSI case, (12) contains an extra term that is made up from the use of the inverse channel technique.In order to understand the effect of this term, a theoretical analysis to extract approximations of its lower and upper bounds are presented.From the system model described in Section II, it will be |e k | 2 ∼ Exponential( P σ 2 ) and the ordered statistics First, the following lemmas are presented to aid in finding the aforementioned bounds.
Lemma 2: The expected value of the inverse of the ordered estimated channel gains is given by where r > 1 and λ Proof: The proof is presented in Appendix B. Lemma 3: The expected value of the inverse of the ordered estimated channel gains is given by where r > 2 and the second index shows the number of total samples in the ordered statistics.
Proof: The proof is presented in Appendix C. Let expression be the term under discussion.We are interested in calculating the expected value of S, because of the effect it will have in the overall performance of the perfect CSI policy under imperfect CSI.Whenever i * ≥ 2 from the law of total expectation we have where P {•} denotes the probability.Then, a direct upper bound for this sum can be computed as shown in the following lemma.
Lemma 4: An upper bound can be computed for (21) as follows where it is well known that where λ |e k | 2 = P σ 2 and ρ is the transmit SNR.Proof: The bound can easily be computed by taking the Cauchy-Schwarz inequality in (21) multiple times.
From Lemma 2, Lemma 3 and ( 23) we can evaluate the upper bound described by (22).Since the Cauchy-Schwarz inequality was used multiple times to derive a closed-form expressions, the bound will not be too tight.In order to get a better view of the behavior of S we can examine a different set of bounds.Due to the channel ordering, it is trivial to show that S is upper bounded by S ′ and lower bounded by S ′′ that are given as and respectively.
Theorem 1: In cases where the noise is sufficiently small, the expected value of S ′ and S ′′ can be tightly approximated by the following expressions and Proof: Due to the correlation between e k and h ′ k , the presented analysis relies on the assumption that It is important to note that this is based on the practical assumption that the statistical mean of the noise is known to the receiver.Therefore, our assumption is a consequence of the p.d.f. that the error e k follows due to the presence of noise and it is straightforward to get that P because of (8).Also as assumed, for high values of transmit SNR, where P ≫ σ 2 , the channel estimation becomes almost independent of the error.Indeed if we carefully examine ( 16), (17) we can see that for increasing transmit SNR, the mean value of the error will decrease by a factor of 1/ √ ρ.
We start by taking into account the inequalities ( 24) and ( 25). ( 24) holds for 1 ≤ i * ≤ K − 1 and (25) holds for 1 ≤ i * ≤ K − 2 and i * = K results in the use of the full power technique.This means that where Z j are random variables that follow the normalized exponential PDF [42] and . Observe that by the definition in (28) and trivial inequalities holds.Also, it is known that the random variable X k = 1 k j=1 Zj follows the inverse Gamma distribution and has mean value equal to 1 k−1 [43].Obviously, this holds for k > 1 and equivalently i * ≥ 2. Using the law of total expectations for the left-hand side (LHS) of (26) we get where P {•} denotes the probability.From (30) and assuming sufficiently small noise we can get Following the same way for (25) we get and the proof is completed.
Calculating sums with the expression E[ 1 Ur ] can be computationally difficult in the general case.Therefore, we aim to further simplify the aforementioned approximations of E[S], leading to closed-form expressions being obtained that approximate (26) and (27).
Corollary 1: Expressions ( 26) and ( 27) can be further relaxed with the help of harmonic numbers as and where ρ denotes the transmit SNR and the harmonic numbers are defined as with γ denoting the Euler-Mascheroni constant [44].
Proof: By the right-hand side (RHS) of ( 29) we can write which leads to (33).Also, By the LHS of ( 29) which equivalently leads to (34) and the proof is completed.
As it can be observed from the derived approximations, the increase of transmit SNR will reduce the effect of the inverse channel terms.However, we can see that for values of i * similar to the ones derived in the perfect CSI case where i * is considerably less than K, the harmonic numbers H K−2 , H K−1 will dominate in (33) and (34), respectively.Considering the asymptotic approach of the harmonic numbers, we can see that the behavior of both approximations slowly diverge for increasing values of K. From the above, it can be seen that in order to neutralize the uncertainty caused by imperfect CSI, more transmitting devices should transmit with maximum power, i.e., i * is larger.
Remark 1: There are a few useful observations that we can use to evaluate the performance of the given approximations in terms of our original assumption and how it will affect them.
• With the increase of transmit SNR, ρ, the effect of the error on the estimation of the channels is reduced, resulting in better approximations.• With the increase of the number of transmitting devices on the system, K, more channels will be affected by errors in estimation, as showcased in Fig. 3, resulting in less accurate approximations.
IV. OPTIMIZATION FRAMEWORK As it was shown in the previous section, imperfections in CSI can deteriorate the performance, even for small levels of noise.Solutions that ignore the possible imperfections that result in optimum controls for the perfect CSI case, by assuming While the mathematical analysis in Section III produced some interesting approximations that can be effectively utilized in the design of an AirComp system, in order to improve the overall performance, we propose a novel optimization framework that takes into account all the extra error terms.To this end, we make the practical assumption that the statistical mean of the noise is known to the receiver.First of all, since a ∈ C * we define ∠a = a p to denote its complex phase.By setting ∆h k = ∠(h k , h ′ k ), using the fact that h ′ k = h k + e k , and (4), the optimization problem of minimizing the MSE is expressed as ) Theorem 2: The optimal power distribution is given by Proof: In order to use the phase factor for minimization in (37) we will need to approximate the term cos(a p + ∆h k ) and find its extrema.However, from trigonometry, cos (a p + ∆h k ) = cos a p cos ∆h k − sin a p sin ∆h k and, as such, any approach to approximate this quantity without knowledge about the sign of the phase difference ∆h k cannot be made because the sign of sin ∆h k will be affected.Since the sign of ∆h k is affected by the phase of the noise in the CSI estimation, we cannot make any assumptions about it and thus we cannot further use a p in minimization.Due to the randomness of ∆h k , choosing a specific a p could worsen the MSE.Therefore, with no prior knowledge of ∆h k , on average, the best option is to set a p = 0. Hence, from now on we  will consider a to be a real number and we rewrite the mean squared error as In order to find the extrema for every variable |b k | we take the first order partial derivative to be equal to zero, thus: and since |b k | is the power magnitude at device k, it must also be 0 ≤ |b k | ≤ √ P .Consequently, the best power distribution will be given by (38) and, thus, the proof is completed.
We can observe that the real MSE is at this point related to the estimations h ′ k only through the terms cos ∆h k .Assuming the imperfections of CSI due to the noise have a constant magnitude of |e k |, then from Fig. 4, the worst case scenario for these terms arise when the phase of the imperfection is such that h ′ k becomes tangent to the circle of constant radius |e k |.This approximation will be quite accurate when the conditions are such that the mean estimated error can be assumed less than the mean of the estimated channels, i.e.
and using it we obtain Hence, if we consider the worst case scenario, where the maximum phase difference is achieved, we have the following relations regarding the relation between the real channel conditions, the estimation error and the channel estimation itself [41] In order to approximate the estimation error, due to the knowledge of its statistics alone, we will use the mean value as the best unbiased estimator, thus for all k.This approach provides a tight approximation of the sinusoidal term for the greater majority of the involved channels, except for those that are greatly affected by the noise.On top of that, this method accounts for the worst case so the actual performance can possibly be better in a more favorable setting.
Using the approximations (41a), (41b) and ( 42), combined with the attainable values of |b k | given by Theorem 2, (39) can now be expressed as where i ∈ {1, • • • , K} symbolizes the possible values of the critical number and the receiver gain, a, is the variable of interest.Following this, the optimal scaling factor a i is given in Lemma 5.
Lemma 5: The optimal a that minimizes ( 43) is given by Proof: Considering (43) as a quadratic polynomial in terms of a and using a well known property for the global extremum of this function, we get the global minimum for every i, that is given by differentiating (43) in terms of a i to finally get (44), and thus, the proof is completed.
At this point, it is observed that (44) will be a tight approximation for high SNR or good channel conditions.This is so, because, due to our approximation the power magnitudes, which will be (38), are points of the function f (x) = x a(x 2 +c) which is an increasing and then decreasing function in terms of x that achieves its maximum at x = √ c.Hence, for these to be in descending order we need |h ′ 1 | ≥ √ c.Otherwise, we observe that the receiver coefficient a that will be calculated can lead some of the weaker estimated channels to use the inverse channel method.Finally, ( 44) is quite similar to the optimum value of a for the perfect CSI case, but will always be less than g i .In other words, the uncertainty caused by the imperfect CSI will force the system to use more transmitting power in an attempt to counter these imperfections.
Corollary 2: At least one device must use its full power during transmission.
Proof: The proof is given in Appendix D. So, it suffices to solve K subproblems for i ∈ 1, ..., K and, then, compare the minimum values Theorem 2 dictates that in order to get a feasible solution for a i , it must hold that If (45) is not satisfied, the following lemma can be used to search for feasible solutions.Lemma 6: there exists a ∈ I i+1 such that a better MSE can be achieved and if √ P there exists a ∈ I i−1 such that a better MSE can be achieved.Proof: The proof is given in Appendix E. The feasibility condition along with Lemma 6 effectively show that whenever a i is not feasible with regards to power allocation, then there is another value of a that would achieve better MSE.Since at least one user must use its full power as proved in Corollary 2 and there are finite intervals of the form I i , a global minimum for MSE exists and its optimal receiver factor a will be such that it equals a feasible value of a i .Thus, we only need to check the values MSE i (a i ) when a i is feasible.This way we can calculate i * and then estimate a i and b k for all k.
Theorem 3: The critical number for the imperfect CSI policy is given by i Correspondingly, from Lemma 5, the optimum a will be given as a * = a i * .
It is important to note that this approach differs from the one followed in [38] in its mathematical derivation, but also in the fact that our approximation covers the more general case of phase misalignment as opposed to the phase alignment considered in [38].

A. Power Consumption
In this subsection, we compare the power efficiency of the imperfect CSI policy obtained by the optimization framework and the perfect CSI policy that ignores channel imperfections.Let the critical number of the perfect CSI policy be i * 1 and the critical number of the imperfect CSI policy be i * 2 .We define the total power consumption P to be given by for the perfect CSI policy and by for the imperfect CSI policy.To examine the power efficiency of each policy, we introduce the following lemma.Lemma 7: The proof is given in Appendix F. With the help of Lemma 7, the following theorem is presented.
Theorem 4: The imperfect CSI policy always consumes more power than the perfect CSI policy that ignores imperfections, i.e.P tot,2 > P tot,1 .
Proof: Let the number of devices transmitting with full power when the perfect CSI policy is used be i 1 and the number of devices transmitting with full power when the imperfect CSI policy is used, i 2 .Similarly to Lemma 6 and also accordingly with [3], whenever there exists i 1 > i such that MSE i1 < MSE i for the perfect CSI policy, there will also exist i 2 > i such that MSE i2 < MSE i for the imperfect CSI policy.The minimum i for which g i < is not satisfied will be the critical number i 1 = i * 1 for which the minimum MSE is achieved for the perfect CSI policy, which means that necessarily i * 2 ≥ i * 1 will hold.By the definition in (13), i * 1 is such that g i * 1 > g i * 2 and whenever i * 2 > i * 1 , as proven in [3], it will also hold that g P .Also notice that for feasibility reasons it will be a * < From these inequalities, we derive that If the equality i * 2 = i * 1 holds, then by the definitions in ( 14), (44) it will be which after simplifications holds because Thus, from (50) we are back in (49).Due to full power transmission we get and by applying (49) for i * 2 + 1 ≤ k ≤ K and summing up we get Combining ( 51) and ( 52) completes the proof of Theorem 4.

B. Pilot Retransmission Policy
From the aforementioned theoretical analysis, it is clear that, statistically, there will be a number of channels whose estimations will be greatly affected by the noise-induced error during the CSI procedure.In order to limit the effect of this in the MSE of the system better estimations for the channels are desired.As a result, we can consider the possibility of making a second CSI estimation round, at least for some channels.One option is for the transmitting devices with the weaker channels to retransmit pilot symbols and then use the average of the two estimations as the new correct estimated channel.For this approach we propose the following heuristic algorithm in order to find the number of channels that will need to reestimate their channels.In order to track the trade-off between the extra resources required for the retransmissions and the resulting MSE, we propose and define the following utility function, called the Retransmission Policy Cost (RPC) as where C(k) symbolizes the cost in resources needed for k retransmissions, E[MSE k ] K denotes the MSE for k retransmissions and our primary concern over the available resources is taken to be the selected cost.Moreover, parameters {d, f } ∈ R 2 are considered to be weights for the cost and the MSE, respectively.Any combination of parameters can be used to give emphasis to either the cost in terms of resources for the retransmissions or its maximum error tolerance.It is important to highlight that the proposed RPC can be changed and its weight parameters can be adjusted whenever CSI estimation takes place to better capture the current state and needs of the system.Without loss of generality, the time penalty required for the retransmission can be included in the RPC metric, but given that the decrease of E[MSE] is vital for the correct interpretation of the superimposed signals, the added overhead due to the added retransmissions is negligent for practical systems.
In order to evaluate in terms of RPC the proposed policy, we consider two special cases of interest.
Case 1) Power cost related RPC: In this case the primary cost is considered to be the power resources available to the transmitting devices.To obtain the first, necessary round of channel estimates exactly K power resources are needed.Assuming that each power resource is equal to 1, then, for every retransmission, 1 additional power resource is needed from the corresponding device.Thus, for k retransmissions, C power (k) = (K + k)P and according to (53) the minimum of RPC is studied.RPC is then expressed as Case 2) Throughput cost related RPC: Another interesting use of the RPC function would be to consider the lost information per transmission, i.e., the added overhead to the transmission, as the cost of the retransmissions.Then, we can use the ergodic capacity to find the information lost for every retransmission.In this context, since the channel estimation is performed for each device individually, for comparison we will assume each device to be transmitting its data in a typical one-to-one link.Considering this, for k retransmissions, we define the throughput cost as l ] symbolizes the total information lost for the initial CSI estimation round, before any retransmissions take place.In this scenario, RPC is expressed as For the calculation of the values of ζ, we will use the following lemma in order to evaluate the throughput cost for every ordered channel.Lemma 8: The ergodic capacity of the ordered channel gains is given by Proof: The proof is presented in Appendix G.
It is worth pointing out that since (56) calculates the mean information loss, it essentially gives a measure of the extra overhead that will be caused by a retransmission for every channel.Therefore, the Throughput cost related RPC can be used to take into account the additional overhead of the system.By definition, RPC expresses a trade-off between a considered cost in resources and the improved MSE achieved by the use of these resources.Then, according to (54) and ( 55), the preferable number of retransmissions will be given by where k = 0 would result in no retransmission.

V. SIMULATION RESULTS
In this section, simulation results are presented to validate the aforementioned analysis.The fading channels have been modeled as Circular-Symmetric Complex Normal distributed variables CSCN (0, 1) to simulate Rayleigh fading conditions.Unless otherwise stated, the transmit SNR is set as ρ = 10dB.We apply Monte Carlo analysis averaging over 10 4 channel realizations (snapshots).Finally, we define the average MSE per user as AMSE = E[MSE]/K and the average per user P as P = E[P]/K.
In Fig. 5, the performance of the perfect CSI algorithm and the proposed optimization technique for imperfect CSI under imperfect channel estimation is presented.As it can be observed, ignoring CSI imperfections not only achieves worse AMSE values, but also fails to converge to a solution as the number of transmitting devices in the system increases.In contrast to this, the proposed technique both achieves better performance and has a diminishing behavior for increasing number of transmitting devices.Apart from that, it is significant to notice that this behavior is mainly a result of the inverse channel term achieved by the use of the algorithm that ignores the imperfect CSI.
It is worth noting that the robustness that the proposed policy exhibits for the increasing number of transmitting devices can be quite important in practical applications of over-the-air computing systems, depending on the use of the resulting target function itself.For example, in federated learning studies where the target function is itself a parameter that updates a global training model, the diverging behavior of the perfect CSI algorithm under imperfect CSI can be of pivotal significance to the functionality of the model since it could lead to more iterations taking place or divergence of the parameters of the model.
In Fig. 6, the comparison between the mean value of the inverse channel terms, the proposed bound (22), as well as the derived approximation expressions, as given by ( 26), ( 27), (34), and (18) combined with Lemma 2 is illustrated.Apart from confirming the validity of the theoretical analysis of this approach, it is important to observe that the growth rate is indeed logarithmic.On top of that, an important remark stems from the fact that the tight lower approximation seems to provide an asymptotic approximation of the term under investigation.Hence, studying the statistical probability of i * and combining it with ( 32) and ( 18), especially when the noise is sufficiently low, can provide useful insights for the design of a system that is going to use the policy that does not account for the CSI imperfections.In Fig. 7 we look at the AMSE performance of different used policies for varying transmit SNR.Firstly, it is important to notice that the full power policy not only achieves large AMSE, but it fails to improve its performance for increasing SNR in contrast to every other discussed policy.This behavior is expected because in the full power policy no channel cancellation is expected which is necessary to correctly estimate the target function of interest.Also it is important to note that the inverse channel policy cannot reach the performance level of the perfect and imperfect CSI policies under imperfect CSI.This is a result of two factors, the non- vanishing inverse channel term described by (12), which would be omitted in the case of perfect CSI, and the presence of noise since the receiver factor in this policy increases its effect.
Concerning the proposed policy we can see that it outperforms the perfect CSI policy under imperfect CSI, especially in the low SNR region.This behavior is to be expected since lower transmit SNR results in greater estimations errors and thus, it is imperative to take this effect into consideration.We would like to point out that the performance improvement even for more practical scenarios like 10dB, is more than 20%, which pinpoints the usefulness of the proposed policy.As it is obvious, in the perfect CSI case the AMSE is smaller than in the imperfect CSI case.In order to get a more favorable AMSE under the latter, one could use the proposed RPC functions to achieve further performance improvements.As it can be seen in Fig. 8, the perfect CSI policy does indeed require less power than the imperfect CSI policy regardless of the transmit SNR, validating Theorem 4.However, it is worth noting that for increasing K the latter shows the same converging behavior as the former, which means that the power per device is ultimately reduced in both cases.These results, however, do not take into account the MSE performance.
In Fig. 9 the retransmission policy, which was discussed in Section IV-B, is presented in terms of average MSE.For any retransmitting device we consider the new estimation of the channel to be given by h , where h ′ k,1 is the initial estimation of k-th channel and h ′ k,2 is the re-estimation made by the proposed retransmission.For comparison reasons, the retransmissions have also been simulated for random selections of channels instead of the weaker ones as proposed.As expected both policies achieve better MSE values than the no-retransmission policy due to the better channel estimations for their corresponding re-estimated channels.We can see that the proposed policy achieves a much better improvement rate over the random channel selection policy.It is important to point out that this improvement is greater mainly for the Fig. 8: P performance vs the number of transmitting devices in the AirComp system accounting or not for imperfect CSI .
first few weaker channels, which confirms the idea that a re-estimation of these channels can be used to decrease the MSE of the system.We can also observe that though a bigger number of retransmissions can achieve further performance improvement, both policies tend to converge on the same MSE, because a lot of channel re-estimations will now be common.From Fig. 10, we also notice that for the available resources at the transmitting devices, the preferable technique would be to make use of them in order to estimate more channels rather than better estimating a smaller number of channels.In order to observe this we simulate two scenarios, a first one where only 1 retransmission is made per channel and a second one where 2 retransmissions are made per channel.For the latter, the new estimation of the k-th channel is . Comparing the two resulting curves for an even amount of resources (so that the 2 retransmissions can be made per channel) we observe that the single round retransmission achieves better performance in terms of AMSE.
In Fig. 11a the power related RPC metric is presented, i.e., C power (k) is examined by means of (54), with regards to the number of retransmissions.By (57) we can see a minimum exists and it is global.For the global minimum observed at k = 2 we can derive from Fig. 9 that an improvement of almost 7% is already achievable for only two channel re-estimations.It is important to point out that it appears that for the statistically weaker channels all retransmission choices achieve a better trade-off between power and MSE than the original proposed scheme without any retransmissions.Notice that the number of channels that achieve better trade-off coincide with the number of channels that were statistically found to be more error-prone as shown in Fig. 3.This also confirms the idea, on which the retransmission policy was based.
For comparison, in Fig. 11b we consider the proposed RPC with respect to power resources cost against a no retransmissions system which has greater maximum transmission power as a result of its conservation of power resources for no additional pilot symbols.In this context every pilot symbol  is transmitted with maximum power, √ P , as explained in Section II.For the no retransmissions scheme for k additional pilot symbols, the would-be consumed power is uniformly distributed among all K transmitting devices, thus the new maximum transmission power P ′ max becomes In Fig. 11b, the black curves show the AMSE of the two policies studied in Fig. 11a for increasing number of retransmissions, while the red curve shows the AMSE of the no retransmission system for increasing maximum transmission power as described by (58).Studying Fig. 11b, we can observe that for a few retransmissions the MSE improvement gain is greater than the gain for a similar system with no   retransmissions and greater available power transmission at each user.Therefore, we can conclude that the utilized resources for some additional pilot symbols have greater impact than the maximum transmit SNR of each user.This behavior is expected, because as explained in Section III the main difficulty that arises in an imperfect CSI scenario is due to the fact that the worst channels' estimation will be greatly affected by the imperfections and thus, a more accurate estimation of them should be preferable over some additional transmission power.Although this may seem counter-intuitive, we would like to point out that in over-the-air systems, unlike other communication models, greater transmission power does not guarantee better results since the objective is the calculation of a target function and not the detection of the received symbols.
In fact, as we can see from Fig. 7, the maximum power policy is the most energy-greedy, but clearly fails to achieve the MSE levels of our proposed optimal policy.
In Fig. 12 we present the throughput RPC for an increasing number of retransmissions as well as the corresponding mean throughput loss.Similarly to Fig. 11a, with the power RPC, in Fig. 12, with the throughput RPC expressed in (55), we can see that by (57) a minimum exists, it is global and it appears that for the statistically weaker channels all retransmission choices achieve a better trade-off between throughput and MSE than the original proposed scheme without any retransmissions.Thus, it is possible to choose the number of retransmissions in such a way, so that both previously discussed RPC functions will achieve better trade-offs.It is important to notice that in the throughput related RPC due to the ordered channels, every calculated term has already a different weight and, hence, every term has a different contribution in E[C tr,total ].This is evident by examining the gap in mean throughput loss between consecutive points.As expected, the lower the channel gain is, the smaller the corresponding mean throughput loss is, which justifies the proposed policy and also explains the behavior of the RPC curve.
Regarding the RPC utility function cases, it is worth emphasizing on the fact that depending on the application of the over-the-air computing system, an AMSE threshold could be necessary to ensure functionality.In this case, further AMSE decrease can be obligatory.Also the fact that the overhead caused by every retransmission, which is expressed as the loss of transmitted information for every retransmission, can be explicitly calculated by Lemma 8 allows various RPC functions to be used while considering fundamental operating parameters of the system.Thus, RPC could offer a systematic way to handle the system's performance while simultaneously making considerate usage of its resources to comply with operational constraints of the system's application of interest.

VI. CONCLUSIONS AND FUTURE EXTENSIONS
In this paper, an AirComp system under imperfect CSI assumptions was considered.The detrimental effect of imperfect CSI was presented for the MSE, and insightful approximations are derived to aid in the design of such systems.Moreover, a comprehensive analysis on how channel estimation errors affect the AirComp system was presented and a novel optimization framework to minimize MSE under those conditions was proposed.In order to counter the effect of the imperfections, an adaptive policy based on pilot retransmission was presented, where the proposed policy shows the potential to greatly improve the performance.Moreover, a new utility function was presented alongside the retransmission policy to showcase the efficiency of the approach.Extensive simulation results were presented to validate the effectiveness of the proposed analysis, showcasing that it can offer great insights into the design of AirComp systems.We note that the findings of this paper delve deeper into the design of more practical AirComp systems and they can form the basis for possible extensions in multiple-input multiple-output (MIMO) cases and federated learning.

APPENDIX A PROOF OF LEMMA 1
We start the proof by presenting an analysis based on order statistics, in order to relate the mean estimation error with the mean of the magnitude of the ordered channels.We denote the correct channel gain ordering in the following way From the order statistics we can find that every ordered sample U r = |h cor r | from a total of K samples has the following PDF Hence, the expected value E[U r ] can be calculated by Setting t = ur σ h we obtain Using binomial expansion and that ∞ 0 exp (−mt 2 ) = π 4m and after some algebraic manipulations, after the integration, we get . (62) Then, we can easily get ( 16) from ( 61) and (62), and, thus, the proof is concluded.

APPENDIX B PROOF OF LEMMA 2
Using ordered statistics for U r we know that the PDF is given by For r = 1 it is straightforward that since the function g(t) = e −tK t is continuous in I = (0, +∞), g(t) > 0, ∀t ∈ I and lim t→∞ g(t) = 0, lim t→0 + = +∞.Setting t = λu r the expected value will be equal to For convenience let I r = ∞ 0 1 t e −tK (e t − 1) r−1 dt and also m = r − 1. Applying integration by parts and binomial expansion where by P we symbolized the polynomial described as P(x) = m k=0 m k (−1) k x K−m+k with deg(P(x)) = K and min {deg(P(x)) = K} = K − m.Breaking the logarithmic term in (66) would result in two integrals that can both be easily calculated by considering the well-known fact that γ = − ∞ 0 ln (z)e −z dz.Hence, (66) will result in with (67) holding by the binomial expansion theorem, since (68) Substituting (67) back in (64) we can get (18) and the proof is completed.

APPENDIX C PROOF OF LEMMA 3
From (63) and setting t = λu r we get For convenience let J r = ∞ 0 1 t 2 e −tK (e t − 1) r−1 dt and also m = r − 1. Applying integration by parts and binomial expansion where the second index in the mean shows the number of samples in the ordered statistics.Notice that the first term on the integration by parts is equal to 0 because for t → +∞ we can prove that We begin with a proof by contradiction, assuming that a full inverse channel method is utilized.Then, by (14) the overall MSE of the system will be where a 0 > and since a 0 > a 1 we see that MSE 1 (a 1 ) < MSE 0 (a 0 ) which means that we get that the performance of the system is better than the full inverse channel method, and, thus the proof is completed.

APPENDIX E PROOF OF LEMMA 6
For the first case notice that when a = a ′ = it is trivial to show that √ P and MSE i (a) is an increasing function in the interval [a i , ∞), we conclude that for a ∈ I i due to (74) it will be MSE i (a) > MSE i (a ′ ) = MSE i+1 (a ′ ).
For the second case observe that since a i ≥ + a ′2 (σ 2 + (i − 1)P c) Thus, by (75) it will be MSE i (a) > MSE i−1 (a ′ ) and the proof is complete.

APPENDIX F PROOF OF LEMMA 7
By the definitions of a i , g i in ( 14), (44) we can work out that Then, if g i < The capacity per symbol transmission of every channel is known to be given by Using order statistics we can find that every ordered sample U r = |h cor r | from a total of K samples has the PDF described by (59).Hence, the expected value E[U r ] can be calculated by Combining relations (80), (81), ( 83) and (86) we can conclude (56).

Fig. 1 :
Fig. 1: Nomographic representation of target functions and the distortion created by the channel effect and noise.

Fig. 2 :
Fig. 2: Model of an AirComp system made up of K transmitting devices and 1 fusion centre.
This article has been accepted for publication in IEEE Transactions on Wireless Communications.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/TWC.2023.3330092© 2023 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.Authorized licensed use limited to: Aristotle University of Thessaloniki.Downloaded on March 10,2024 at 17:07:21 UTC from IEEE Xplore.Restrictions apply.

Fig. 5 :
Fig. 5: MSE performance and distinct terms vs the number of transmitting devices in the AirComp system accounting or not for imperfect CSI (Transmit SNR = 10dB).

Fig. 6 :
Fig. 6: Inverse channel terms vs the number of transmitting devices in the AirComp system for the various approximations (Transmit SNR = 10dB).

Fig. 7 :
Fig. 7: AMSE vs transmit SNR for different policies and CSI conditions for K = 50 transmitting devices.
This article has been accepted for publication in IEEE Transactions on Wireless Communications.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/TWC.2023.3330092© 2023 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.Authorized licensed use limited to: Aristotle University of Thessaloniki.Downloaded on March 10,2024 at 17:07:21 UTC from IEEE Xplore.Restrictions apply.
Power cost-related RPC vs the Number of Retransmissions MSE vs the number of Retransmissions or transmit SNR.

Fig. 11 :
Fig.11:The proposed policy, the random policy, and the original scheme without retransmission for d = f = 1 and P k = 1, ∀k when power consumption is considered as cost.

Fig. 12 :
Fig. 12: Retransmission Policy Cost vs the Number of Retransmissions for the proposed policy and the original scheme without retransmission for d = f = 1 and P k = 1, ∀k when the throughput loss is considered as cost.

e 0 e 2 h ∞ 1 e
−Kt (e t − 1) r−1 log 2 1 + tρσ 2 h dt.m I r,m ,(82)where by I r,m we denote the following integralI r,m = ∞ −Kt e t(r−1−m) log 2 1 + tρσ 2 h dt.(83)The last one can be rigorously calculated in a few steps.Substituting y = 1 + tρσ 2 h and setting for convenience b = K−r+1+m ρσ 2 h , (83) is equivalent with I r,m = 1 ρσ −b(y−1) log 2 (y)dy.(84) Then, changing to the natural logarithm and setting z = by from (84) we have I r,m = 1 K − r + 1 + m part integration of the first integral in (85) we obtain the following I r,m = 1 K − r + 1 + m This article has been accepted for publication in IEEE Transactions on Wireless Communications.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/TWC.2023.3330092 © 2023 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.Authorized licensed use limited to: Aristotle University of Thessaloniki.Downloaded on March 10,2024 at 17:07:21 UTC from IEEE Xplore.Restrictions apply.
and similarly for t → 0 + we can prove that lim t→0 + ln (t)e −tK (e t − 1) m = lim t→0 + ln (t)(e t − 1) m (38)is a decreasing function in the interval [0, a i ] there exists a ′ ∈ This article has been accepted for publication in IEEE Transactions on Wireless Communications.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/TWC.2023.3330092MSEi(a′ ) because for feasibility reasons when the critical number is i by(38)it must be a ≤ i i | 2 +c) √ P , a i ⊂ I i−1 such that MSE i (a) > © 2023 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.Authorized licensed use limited to: Aristotle University of Thessaloniki.Downloaded on March 10,2024 at 17:07:21 UTC from IEEE Xplore.Restrictions apply. 2 + a ′2 (σ 2 + iP c)