Sparse Channel Estimation From Discrete-Time Fourier Transform Beam Measurements

In this paper, we study channel estimation at a uniform linear array (ULA) with <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> antennas, where the channel at the ULA is composed of <inline-formula> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula> paths with different angles of arrival (AoAs). It is assumed that Discrete-Time Fourier Transform (DTFT) beams (also known as analog beams with DFT beams as special cases) are applied at the ULA to project the incoming signal onto a single (or multiple) RF chain(s), after which the signal is sampled and measured in baseband domain; the underlying signal is assumed to be constant during the projections. This measurement procedure arises in various communication systems, such as the receive beam sweeping phase in 5G NR, where DTFT beams are used due to their simple implementation as linear phase shifts on analog antennas. A fundamental question about this procedure is the number of DTFT measurements <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula> needed to recover the <inline-formula> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula> AoAs. Previous work on this problem showed (by applying compressed sensing theory) that <inline-formula> <tex-math notation="LaTeX">$K \approx L\mathcal {O}(\log (N/L))$ </tex-math></inline-formula> measurements are sufficient for recovering the AoAs, which grows with <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula>. First, we show that necessary conditions for recovery are <inline-formula> <tex-math notation="LaTeX">$N \geq 2L$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$K \geq 2L$ </tex-math></inline-formula>. Second, by using properties of DTFT beam projections, we are able to show that if <inline-formula> <tex-math notation="LaTeX">$N \geq 2L$ </tex-math></inline-formula> then <inline-formula> <tex-math notation="LaTeX">$K \geq 3L$ </tex-math></inline-formula> arbitrary DTFT measurements suffice; hence, dependency on <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> is completely removed. Furthermore, if the DTFT beams are chosen to equal DFT beams with period <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula>, then <inline-formula> <tex-math notation="LaTeX">$K \geq 2L$ </tex-math></inline-formula> beam measurements are enough, achieving sufficiency of the necessary conditions. With these results, an AoA estimation algorithm is formulated which has enormous complexity savings compared to <inline-formula> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula>-dimensional AoA search such as maximum likelihood (ML) estimation. Numerical simulations demonstrate the algorithm’s improved performance over conventional algorithms such as beamspace ESPRIT and compressed sensing.

M ODERN communication systems are utilizing high frequency bands (above 30 GHz) for enhanced data rates. Examples of such systems are 5G NR [1] and IEEE 802.11ay [2]. The main reason for enhanced data rates at high frequencies is the availability of more bandwidth. However, the propagation channel is not favorable at these frequencies due to high pathloss. In order to obtain power boosting, the typical approach is to employ more antennas at the devices to transmit/receive more of the signal. The high frequencies, which result in small signal wavelengths, allow packing more antennas into a small area on the device due to the MIMO rule of thumb of an antenna separation that equals half the signal wavelength [3]. Furthermore, due to the high pathloss, the channel is composed of a relatively small number of dominant spatial directions incoming from different AoAs. This knowledge about the sparsity of the channel allows the devices to customize their hardware and training protocols in order to efficiently estimate the channel. Therefore, at high frequencies, standards such as IEEE 802.11ay and 5G NR, employ a beam sweeping phase with beams that have the same structure as the spatial directions to be estimated. These beams go typically under the name of analog beams in the communications literature [5], and can be easily implemented as linear phase shifts across a ULA. The beam sweeping phase is composed of transmissions and receptions with analog beams, where each transmit beam results in a channel composed of L paths at the receiver. Moreover, each transmit beam is repeated a number of times so that the receiver can learn the best receive beam for each transmit beam [4]. This is done by projecting each repeated signal at its N antennas with a set of analog beams, where the size of the set depends on the number of RF chains the receiver has access to (one RF chains means it can only use a single analog beam at a time for projection). After the beam sweeping phase, the receiver reports the index of the transmit beam that is best matched to one of its receiving beams. Simultaneously, it can use the projections of the channel that resulted from the reported transmit beam to obtain an estimate of the L AoAs that constitute that channel. The estimate can be used for further refinement of its receiving analog beam so that it matches the channel that arises from the reported transmit beam even better; the estimation of the L AoAs is what the work here is about. Several previous works have utilized the sparse channel assumption to estimate L unknown AoAs. Examples include applications of compressed sensing (CS) [6], [7], MUSIC [8], ESPRIT [9], beamspace ESPRIT [10], [11], Matrix Pencil Method (MPM) [12], and many others. Some of these methods rely on estimating the covariance matrix of the received signals, which would require many repeated beam transmissions (and thus cause a large delay) as well as the assumption that the underlying MIMO channel is constant during the sweeps (which can be easily violated due high sensitivity to mobility at mmWave frequencies). For these reasons, methods requiring covariance matrix estimation are not always practical in mmWave communications. MPM, beamspace ESPRIT and the methods in [6] and [7], on the other hand, assume no such statistical information and allow estimation of the AoAs from a few sweeps. As will be shown in Section III-A below, for a given transmit beam, applying MPM to the beam sweeping problem would require N sweeps at the receiver to estimate the L AoAs resulting from that transmit beam. Instead, the methods in [6] and [7] require at least an order of O(L log(N/L)) sweeps at the receiver for estimating the same L AoAs. This is a significant improvement compared to MPM, especially when N ≫ L. The works [6] and [7] apply CS to deduce this lower bound on number of sweeps, which is well known to result in logarithmic dependency on the dimension of the measured signal (which is N in this case). However, a drawback of both MPM and CS methods is that the number of sweeps required depends on N (moreover, to obtain recovery guarantees, CS methods require N to be large). In mmWave channels, the number of AoAs tends to decrease when N increases due to the high directivity of large antenna arrays. This gives us the opposite effect of requiring more and more measurements for fewer and fewer AoAs, clearly an undesirable effect. From this perspective, beamspace ESPRIT is attractive since it requires only 2L 2 measurements. However, a requirement of beamspace ESPRIT for perfect recovery in absence of noise is that measurements should be collected across L different (time) slots, during which the path gains are different in each slot while the PoAs stay constant. Hence, the performance of beamspace ESPRIT is expected to be severely degraded when the L PoAs vary or when the different channels are highly correlated, as will be confirmed by simulations in Section VI. In this work, we provide an algorithm that requires only an order of O(L) sweeps in one time slot, during which the channel is assumed to be constant, for perfect estimation in the noiseless case.
The paper is structured as follows. In Section II we introduce the system model and the problem formulation. Section III discusses related works and highlights their difference with this work as well as reformulates our problem as recovering a certain function from its samples. Necessary conditions on N and number of sweeps are provided in Section IV. Section IV also takes the sampling idea from Section III further and reformulates the sampling process in terms of samples of rational functions. This approach exposes the channel sparsity that is inherent in our measurements in the domain of rational functions, from which our main theoretical results are obtained. These results directly imply that the necessary conditions on N and number of sweeps are also sufficient conditions. Furthermore, a direct corollary of the theoretical results in Section IV is an AoA estimation algorithm for noisy scenarios, which is introduced in Section V. The performance of the algorithm is investigated in Section VI by comparing it to MPM, the CS methods in [6] and beamspace ESPRIT. Finally, concluding remarks and outline of future work is given in Section VII.

II. SYSTEM MODEL AND PROBLEM FORMULATION A. System Model
We assume a ULA with N antennas which receives a superposition of L signals with complex path gains c 0 , . . . , c L−1 and π-normalized AoAs θ 0 , . . . , θ L−1 (i.e., each within interval [−1, 1]) which we henceforth refer to as phases of arrival (PoA). Mathematically, this can be expressed as where y is the received signal across the ULA, a(θ j ) = [1 e −iπθj . . . e −iπ(N −1)θj ] T is a DTFT vector with i 2 = −1 being the imaginary number, n = [n 0 . . . n N −1 ] T is an AWGN vector where each element has power N 0 , A(θ) = [a(θ 0 ) . . . a(θ L−1 )] is a matrix with the DTFT vectors as columns and c = [c 0 . . . c L−1 ] T is the vector of path gains.
In (1), one can interpret the signal y k = L−1 j=0 e −iπkθj c j +n k at antenna k as a sample of the received signal. The model in (1) is commonly used in mmWave communications [13] and is assumed by many classical PoA estimation techniques, such as MUSIC, ESPRIT and MPM. These methods use the samples where it is noted that the c l can be estimated by solving a linear system once the PoAs are known. In contrast to those works, herein it is assumed that the receiver further projects y with distinct unit norm DTFT vectors (beams) As with θ l , we assume that ϕ k ∈ [−1, 1). Defining g j △ = cj √ N as new normalized path gains and expanding a H (ϕ k )a(θ j ) in (2) further we get where we define the noiseless samples Creating the vectors r = [r 0 . . .
can be expressed in matrix form as The noise vector w in (5) is zero-mean Gaussian noise with Σ w = E{ww H } = N0 N A(ϕ) H A(ϕ) as the covariance matrix. If the beam matrix A(ϕ) is orthogonal, for example when the DTFT beams equal DFT beams (which happens when ϕ k = 2l k /N for some integer l k ), then w is AWGN. Comparing the samples r k in (3) with the measurements y k in (1), the main difference is that for measurement k, e −iπkθj is changed to Thus, the measurement model in (3) is fundamentally different from the measurement model in (1).

B. Problem Formulation
The first task in this work is to recover the PoAs θ l , 0 ≤ l ≤ L − 1, given the noiseless samples s k in (4). This corresponds to solving a non-linear system of equations in the unknowns θ l . As mentioned before, once the PoAs are known, by inserting their values into (4) the path gains g l are easily obtained by solving a linear system of equations. Thus, from now on, we only focus on recovering the PoAs. The reason for interest in the noiseless case is to answer four fundamental questions: 1) How many antennas N are needed to uniquely recover θ 0 , . . . , θ L−1 from (4)? 2) How many measurements K are needed to uniquely recover θ 0 , . . . , θ L−1 from (4)? 3) At which phases should these measurements be taken? 4) From these measurements, can we reconstruct θ 0 , . . . , θ L−1 without extensive searching in the phase domain [−1, 1) for each PoA? Questions 1 and 2 are of fundamental importance since they address uniqueness when solving the non-linear system of equations (4) for the PoAs θ l . If there is no unique solution for the PoAs given the samples s k in (4), then there is inherent uncertainty in the estimation problem and even the optimal ML algorithm fails in this case (unless there are countably many solutions and we have some prior knowledge about the solution we want). One can suspect that N and K play a fundamental role in guaranteeing the existence of a unique solution. Namely, if the array has a single antenna (N = 1), it is well known that even for a single path (L = 1) there is no way to distinguish between θ 0 and the phase in g 0 ; hence, in this case, we must have N ≥ 2. Thus, there seems to be a lower bound to N , that depends on L, which must be satisfied for there to exist a unique set of PoAs that produces the noiseless samples s k in (4). Similarly, it is obvious that when L is much larger than K (i.e., few measurements relative to the number of paths), there could be many realizations of θ l , g l that produce the same samples s k . The only way forward then is to take more measurements, where the number of measurements should somehow be related to L. However, due to the non-linearity of (4), it could happen that there is never a unique solution to the PoAs from (4). Fortunately, we will show that there is always a unique solution to (4) once K is above a certain value that only depends on L.
Question 3 is concerned with the DTFT beams used for measurement. Are any (distinct) phases allowed, or do we need to use particular phases to solve (4)? Is one choice better than other in some aspect? As we will see, it is in general possible to use any distinct set of phases, which is practically relevant since it gives the device more playroom in choosing appropriate phases for measurement based on e.g. prior PoA knowledge. Peculiarly, as will also be shown, a particular set of phases can help in further reducing the number of measurements.
Question 4 is related to complexity. Optimal reconstruction of PoAs in a noisy scenario is achieved with the ML method, which entails a search across g l , θ l ; this becomes of prohibitive complexity even for relatively small values of L. Moreover, the likelihood function can have many local optima (even if conditions for the uniqueness of PoAs are satisfied), thus resulting in erroneous ML estimation unless starting values close to the global optimum are obtained. Hence, there is a need for a method that can quickly and reliably provide estimates close to the global optimum, after which a refined ML search can be made around the provided estimates.
III. RELATED WORK Before answering the questions in the previous section, we take a look at previous work on this problem.

A. MPM
MPM deals with a sum of complex exponentials with different amplitudes and phases as in (1). It shows that if N ≥ 2L, it is possible to recover the phases θ l and gains c l , 0 ≤ l ≤ L − 1, perfectly in absence of noise. When N ≥ 2L and we take K = N measurements, A(ϕ) in (5) becomes an N × N Vandermonde matrix [14] which is invertible as long as the DTFT phases are different. In this case, we obtain s = A(ϕ) H −1 s = A(θ)g, which is (in absence of noise) equivalent to (1) onto which the MPM is applicable. Hence, this shows that N ≥ 2L and K ≥ N are sufficient for recovering the PoAs in absence of noise. For K < N , the inverse A(ϕ) H −1 can be replaced by the Moore-Penrose pseudo-inverse A(ϕ) H + . Of course, this replacement does not result in perfect recovery when K < N , but can anyway be used in noisy scenarios. We will evaluate the performance of MPM in Section VI.

B. Beamspace ESPRIT
Beamspace ESPRIT [10], [11] takes the ideas of MPM and applies them to L different instances of (5); by doing so, it is able to achieve perfect recovery in absence of noise with K ≥ 2L measurements for each instance. The assumed model of beamspace ESPRIT is , the PoAs θ are assumed to be constant during T different transmissions while their gains g k change. Let G = [g 0 . . . g T −1 ]. A constraint required by beamspace ESPRIT is that the rank of G should be at least L. Hence, a total of at least K ≥ 2L 2 measurements are required for beamspace ESPRIT, which is thus able to remove the dependency of K on N that MPM did not manage (with the added complexity of T repeated transmissions). If these T transmissions result in similar received signals, so that G becomes close to rank deficient, the estimation quality of beamspace ESPRIT will deteriorate. Similarly, as soon as θ starts to vary during the T transmissions (which is a practically reasonable scenario), the estimation degrades significantly and even results in an error floor since it only outputs a single estimated θ for the T transmissions. These drawbacks of beamspace ESPRIT will be illustrated with simulations in Section VI. Therefore, an algorithm that recovers Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
the PoAs during each transmission, with O(L) measurements, is needed.

C. Compressed Sensing
As mentioned in Section I, the work in [6] deals with (5) by applying CS and thereby requiring O(L log(N/L)) measurements. However, the CS methods provide only a finite PoA resolution, i.e., in absence of noise they are unable to give the exact value of θ. Therefore, in order to obtain a very accurate estimate of the PoAs θ in (5), the matrix Ψ in [6] needs to be a large DTFT matrix is a vector of distinct phases within [−1, 1) and S is a large integer. Hence, Ψ(α) represents a resolution grid of the PoAs. To recover the PoAs θ, the following optimization problem is formulated where ∥.∥ 1 denotes 1-norm and ∥.∥ 2 2-norm. The estimated L PoAs are the grid values α l0 , . . . , α l L−1 that correspond to the L largest magnitudes |g l0 |, . . . , |g l L−1 | from the g that solves (6). The larger the S and N , the better the resolution of the PoA estimates but also the higher the complexity of the estimation algorithm. Instead of using A(ϕ) as measurement beams, [6] proposes to use random Rademacher matrices (RMs) as measurement beams. A Rademacher matrix is characterized by each of its entries being ±1, which can be easily implemented with analog RF by setting the angles to ±π. If R is a uniformly distributed RM, then as shown in [6] this matrix results in better CS properties of the matrix product RΨ(α) than A(ϕ)Ψ(α). The performance of (6) with A(ϕ) and R will be compared with other algorithms in Section VI.

D. Sampling Interpretation
If we see the DTFT phases ϕ k , 0 ≤ k ≤ K − 1, as samples of a continuous variable ϕ, then the s k in (4) are samples of the function s(ϕ) at the sampling points p(ϕ) is the well known beamforming function [3], which is the beamforming gain at a ULA for a signal incoming at phase ϕ and a receiving beam centered at phase 0. In the mathematical literature, p(ϕ) 2π is known as a Dirichlet kernel. Obviously, p(ϕ) is periodic with a period of 2, thus so is s(ϕ) too. Hence, recovering θ j , g j , 0 ≤ j ≤ L − 1, is equivalent to recovering s(ϕ) from its samples s k , 0 ≤ k ≤ K − 1. This connection makes our problem equivalent to recovering a function from its samples, where in our case the function is a linear combination of shifted Dirichlet kernels. Figure 1 shows an example of how this function can look for a specific case. Answering questions 2 and 3 in Section II-B is equivalent to answering how many samples of s(ϕ) are required, as well as where the corresponding sampling points should be located in [−1, 1], for perfect recovery of s(ϕ). The figure shows s(ϕ) for a specific case where N = 8, where δ(x) is the Dirac delta function. Clearly, P (f ) is a Dirac train with Dirac delta functions located at frequencies k/2, 0 ≤ k ≤ N − 1, and thus it has a bandwidth of (N − 1)/2. From (8), it follows that S(f ) also has a bandwidth of (N − 1)/2. According to the Shannon-Nyquist sampling theorem, a sampling rate T s above 2/(N − 1) is necessary in order to avoid aliasing of S(f ). Because s(ϕ) is periodic with period of 2, this amounts to taking more than 2/T s = N − 1 samples s k within the period, i.e., at least N samples. The DFT of these samples produces the values S(k/2), 0 ≤ k ≤ N − 1, which equal a linear combination of exponentials as in (1) onto which MPM can be applied for recovering the PoAs. However, as seen, we need O(N ) samples to recover S(f ) perfectly with the Shannon-Nyquist sampling theorem. This is similar to MPM, where N samples are needed to invert A(ϕ), except for a subtle, but important, difference. Namely, the application of the Shannon-Nyquist theorem to s(ϕ) requires uniform samples, meaning that in this case, the sampling points are ϕ 0 + 2k/N , 0 ≤ k ≤ (N − 1), for some starting value ϕ 0 . On the other hand, MPM allows using arbitrary sampling points in [−1, 1) because the DTFT beams that constitute A(ϕ), whose phases ϕ k are the sampling points, can be arbitrary. Thus, this is a stronger result than direct application of the Shannon-Nyquist sampling theorem to s(ϕ), since the different sampling points do not need to be uniform. However, both methods still require N samples for recovering the PoAs. Lowering the number of samples below N will inevitably result in aliasing of S(f ) according to the Shannon-Nyquist theorem, corresponding to sub-Nyquist sampling. In Section IV, we will make a connection between (7) and rational functions, which will allow us to recover the PoAs with sub-Nyquist sampling.

IV. MAIN RESULTS
We start by providing lower bounds to N and K that are necessary conditions for questions 1 and 2 in Section II. To the author's knowledge, N ≥ 2L is a sufficient condition for perfect recovery with MPM in absence of noise but not necessary. Next, we will strengthen this fact and show that N ≥ 2L is a necessary condition for MPM, i.e., a necessary condition for recovering θ from (1) when n = 0.
Theorem 1: If there is a unique solution θ, g to y = A(θ)g then N ≥ 2L.
Proof: Assume N < 2L. We will show that y = A(θ)g = A(θ)g, i.e., When the elements in θ,θ are all distinct, A(θ) A(θ) has rank N and thus a null space of dimension 2L − N . Any non-zero g T −g T T in this null space satisfies (9), which proves the theorem.
□ Since uniqueness of θ, g in (1) is necessary (but not sufficient) for uniqueness of (4), we get Corollary 1: If there is a unique solution θ, g to (4) then N ≥ 2L.
Corollary 1 shows that N ≥ 2L antennas is necessary to have a unique solution θ, g in (4), i.e., N ≥ 2L is a lower bound that must be satisfied. Next, we show Theorem 2: If there is a unique solution θ, g to (4) then K ≥ 2L and N ≥ 2L.
Proof: We know from Corollary 1 that N ≥ 2L is necessary, so we assume this holds. For the sake of contradiction, assume then that K < 2L. We will show that has a non-zero solution g T −g T T for some θ ̸ =θ. Choosing the elements in θ,θ to be all distinct, the □ Hence, K ≥ 2L, N ≥ 2L are necessary (but not sufficient) lower bounds for there to exist a unique solution θ, g to (4). We will prove sufficiency of N ≥ 2L, K ≥ 2L by explicitly constructing an algorithm that recovers θ, g from the measurements in (4) when N ≥ 2L, K ≥ 2L.
If we make the substitutions z = e iπϕ , ϕ = Im{log(z)/π}, where Im{.} denotes the imaginary part of a complex quantity, we can see the samples s k in (4) as samples of the N −1 degree polynomial s z (z) It is seen that the N polynomial coefficients (also referred to as parameters) that define s z (z) uniquely, equal the exponential sum observed at the different antennas in (1) for which MPM can be applied. Obtaining 2L coefficients from (10) is enough for the application of MPM. However, from the theory of polynomial interpolation, it follows that the N polynomial coefficients of s z (z) are obtained uniquely by evaluating s z (z) at N different values of z; this is analogous to the Shannon-Nyquist sampling theorem, expressed here instead as polynomial interpolation. Hence, as before, N DTFT measurements are again required for unique estimation of the polynomial coefficients. To obtain O(L) samples, we will next provide an equivalent representation of s z (z) that has O(L) parameters defining it.

A. Representation as a Rational Function
If we apply the geometric series identity to (10), it readily follows that s z (z) can be expressed as a rational function in z where The is, up to a constant, uniquely defined by its roots which equal e iπθ l , 0 ≤ l ≤ L − 1; hence, there is a one-to-one correspondence between v rat (z) and the PoAs. Although u rat (z) is of degree N + L − 1, it can be seen that its coefficients that correspond to the N − L exponents z L , z L+1 , . . . , z N −1 are zero. Thus, effectively, u rat (z) has only 2L non-zero coefficients. Since u rat (z) = 0 and v rat (z) = 0 when z equals e iπθ l , 0 ≤ l ≤ L − 1, they have these and only these L roots in common because v rat (z) is of degree L. Clearly, we now have two different expressions of s z (z): the polynomial expression in (10) and the rational function expression in (11). The PoAs can be obtained from both of them; in the former case, by applying MPM to 2L polynomial coefficients and in the latter case by finding roots of v rat (z). An important remark here is that the polynomial expression can also be expressed as a rational function, where the numerator is the polynomial in (10) and the denominator is 1; still, in order to avoid confusion, we will refer to (10) as the polynomial expression of s z (z). The polynomial expression is determined by N parameters, while the rational function expression is determined by 3L + 1 parameters (2L from u rat (z) and L+1 from v rat (z)). Hence, when N > 3L + 1, s z (z) has a sparser representation as a rational function than the polynomial expression. If we are able to recover these 3L + 1 parameters with O(L) measurements, our goal is achieved.

B. Rational Function Recovery
Inserting the sampling points z k into (11), we get a system of equations in the coefficients of u rat (z), v rat (z) From (15), it follows that these coefficients necessarily satisfy the linear equations where u(z) and v(z) are polynomials of degrees N +L−1 and L, respectively (they do not need to be of the form as in (12),  (15); hence, in our case, this means that s z (z) in (11) is the only solution to (15). However, as we saw above, s z (z) can be expressed in different ways as a rational function: the polynomial expression in (10) and the rational function expression in (11). All different expressions of s z (z) as a rational function necessarily satisfy the linear system in (17), shown at the bottom of the next page. As an example, the polynomial expression of s z (z) in (10) is obtained by letting The resulting x vector in (17), which we denote by x poly in this case, solves (17). These choices correspond to the polynomials u poly (z) = (−1) L N −1 j=0 z j L−1 k=0 g k e −iπjθ k and v poly (z) = (−1) L+1 . On the other hand, if we let u and v equal the coefficients of u rat (z) and v rat (z) in (11) (where again v rat 0 = (−1) L+1 ), and denote the resulting u, v, x as u rat , v rat , x rat , respectively, then x rat also solves (17). Since x poly ̸ = cx rat for any complex number c, and they are both non-zero, it implies that A(z) in (17) always has a null space of at least two dimensions; thus, there is an infinite number of solutions to (17). What [15,Th. 2.2.1.4] states is that all these solutions to (17) represent the same unique function s z (z) when K ≥ N + 2L. Obtaining these solutions enables finding the solution that corresponds to x rat , from which we get v rat and its associated polynomial v rat (z) that encodes the PoAs into its roots.
The question is whether we can obtain the 3L + 1 elements of x rat after O(L) measurements. Direct application of [15, Th. 2.2.1.4] results in K ≥ N + 2L measurements for obtaining s z (z) (and thus x rat ), which is even more measurements than the N measurements required for finding the polynomial expression in (10). The reason for this increase is that [15, Th. 2.2.1.4] deals with recovering a rational function where the only prior knowledge about it are the degrees of the two polynomials defining it. In our case, we have additional knowledge of these polynomials, such as the fact that some exponents in u rat (z) are absent as well as some knowledge about the roots. Utilizing this extra knowledge, we will show that it is possible to obtain unique recovery of s z (z) with only O(L) measurements.

C. Recovering Rational Expression of s z (z)
Start by removing the columns from A that correspond to elements in x rat that equal to 0; this results in the K ×(3L+1) matrix B(z) in (18), shown at the bottom of the next page. Hence where we let u j:k denote the vector from u obtained from the elements in u on positions j to k; vector x rat now corresponds to t rat where the 0s in x rat are removed. Clearly, B(z)t rat = 0 for any z, whence it follows that B(z) has a null space of at least dimension one. Note that if t is a solution to B(z)t = 0, then all scalar multiples ct of t are obviously also a solution to B(z)t = 0.
Solving B(z)t = 0 gives all polynomials u(z), v(z) such that where u(z) has degree N + L − 1 and lacks the exponents z L , . . . , z N while v(z) has degree L. These polynomials are unique, up to a constant c, if and only if B(z) has a one-dimensional null space which therefore implies that u rat (z) = cu(z) and v rat (z) = cv(z) for all z. Hence, in this case, there is a unique function s z (z) that produces the observed samples s k , 0 ≤ k ≤ K − 1, and it corresponds to a unique set of PoAs, namely the roots of v(z). Corollary 1 and Theorem 2 give the necessary conditions N ≥ 2L and K ≥ 2L for B(z) to have a one-dimensional null space and thus a unique representation of s z (z) as a rational function; however, it is unclear whether these conditions are sufficient. In other words, given N ≥ 2L and K ≥ 2L, can we always find sampling points z k , 0 ≤ k ≤ K −1, so that B(z) has a one-dimensional null space? This is partially answered by the next theorem.
Theorem 3: Assume N ≥ 2L and K ≥ 3L. For 0 ≤ k ≤ 2L−1, choose z k such that z N k = e iγ for some real number γ, while for 2L ≤ k ≤ K−1 choose z k as any arbitrary sampling points. Then, with probability 1, all solutions to B(z)t = 0 satisfy t = at rat , where a is any complex number.
Proof: The k:th equation in B(z)t = 0 equals Since s z (z k ) = u rat (z k ) v rat (z k ) , multiplying both sides of (19) by v rat (z k ) gives Multiplication with v rat (z k ) does not result in 0 = 0 with probability 1 since z k , 0 ≤ k ≤ K − 1, do not equal any of e iπθ0 , . . . , e iπθ L−1 with probability 1; this is obvious because e iπθ0 , . . . , e iπθ L−1 are random complex numbers on the unit circle. Equation (20) can be interpreted as the samples p(z k ) of the polynomial being 0, where To prove the theorem, it suffices to show that v(z) = cv rat (z) and u(z) = cu rat (z) for some non-zero complex number c, which implies that t = ct rat . If this holds for one such number c, then it holds for all possible complex numbers a since B(z)at rat = 0.
Recall that u rat (z) has the same exponents as u(z). Define The definition of u rat (z) in (12) and (14) implies that The definitions in (23) allow us to express p(z) as Because the sampling points z k , 0 ≤ k ≤ 2L − 1, satisfy z N k = e iγ (for which N ≥ 2L is necessary), p(z k ) equals a polynomial q(z k ) in z k of degree 2L − 1 at these sampling points. It follows from (26) that p(z k ) = 0 is equivalent to Since the highest exponent of z k in q(z k ) is 2L − 1, it follows that the equality in (27) holds for all z Next, observe that the roots e iπθ l , 0 ≤ l ≤ L − 1, of v rat (z) are not roots of u rat L−1 (z) + e iγũrat L−1 (z) in (28). Namely, Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
direct insertion of e iπθ l gives from (24) that u rat L−1 (e iπθ l ) + e iγũrat L−1 (e iπθ l ) = g l q l (e iπθ l )(e iγ e −iπN θ l − 1) ̸ = 0 with probability 1 since g l ̸ = 0, q l (e iπθ l ) ̸ = 0 and e iγ e −iπN θ l ̸ = 1 with probability 1. Thus, it must hold that e iπθ l , 0 ≤ l ≤ L − 1 are roots of v(z), and because v(z) and v rat (z) are both of degree L, we have v(z) = cv rat (z) for some complex number c. To finish the proof, it now remains to show that u(z) = cu rat (z) for the same c.
Inserting v(z) = cv rat (z) into (28) gives Using the identities v(z) = cv rat (z) and (29) in (26) we get To summarize, after the 2L sampling points z k , 0 ≤ k ≤ 2L−1, satisfying z N k = e iγ , we can make the conclusion that v(z) = cv rat (z) and that the equations B(z)t = 0 result in the factorization of p(z) as in (30). Note that cũ rat L−1 (z)−ũ L−1 (z) has degree L − 1. Thus, by sampling p(z) further with points z k , 2L ≤ k ≤ K, where K ≥ 3L−1, z N k ̸ = e iγ , and the z k not being equal to any of e iπθ0 , . . . , e iπθ L−1 , it follows from (30) that p(z k ) = 0, 2L ≤ k ≤ K, implies cũ rat L−1 (z) =ũ L−1 (z) for all z. Inserting this equality in (29) also gives cu rat L−1 (z) = u L−1 (z); hence, u(z) = cu rat (z). This finishes the proof. □ Theorem 3 requires K ≥ 3L, thus L more measurements than the lower bound K ≥ 2L. The sampling points z k specified in the theorem produce a B(z) that always has a one-dimensional null space. Out of the K sampling points, 2L satisfy z N k = e iγ for some real value γ (any γ works), which means that they equal z k = e i(2πl k +γ)/N , where l k are distinct integers; we refer to these points as γ-shifted DFT points, where a shift of γ = 0 would be the most common choice. The other K − 2L sampling points can be arbitrary sampling points.
Clearly, the proof of Theorem 3 shows that there is something peculiar about γ-shifted DFT points. Denote the set of 2L γ-shifted DFT points as z DFT . From the proof of Theorem 3, it is readily seen that after sampling with z DFT , we can already conclude that v(z) = cv rat (z) for all z, i.e., v = cv rat . Although v is unique (up to a scalar) after z DFT , there are infinitely many t solving B(z DFT )t = 0. This follows because B(z DFT ) is of dimension 2L × (3L + 1) and thus has a null space of dimension at least L. Since v is unique up to a scalar, it is the vector u that changes due to this null space, where for each such u we have t = [c(v rat ) T u T ] T and B(z DFT )t = 0. In other words, v is contained in a one-dimensional space (spanned by v rat ) while u is inside a space of at least dimension L. Since all the information of the PoAs is contained in v rat , we can avoid solving for u in favor of a simpler equation system. Note that for z k such that z N k = e iγ , the last L columns in B(z) equal the L columns to the left of them multiplied with e iγ . Hence, Solving B sub (z DFT )t sub = 0 thus amounts to solving a 2L× (2L + 1) linear system, where the solution t sub is such that the first L + 1 elements of t sub equal cv rat for some complex number c. As a result, the complexity of finding v rat is reduced from solving a 3L × (3L + 1) linear system to solving a 2L × (2L + 1) linear system. Define t rat sub by replacing u, v in (31) with u rat , v rat . Hence, a corollary to Theorem 3 is choose z k such that z N k = e iγ for some real value γ. Then, with probability 1, all solutions to B sub (z DFT )t sub = 0 satisfy t sub = at rat sub for any complex number a. Proof: As shown in the proof of Theorem 3, after 2L measurements with γ-shifted DFT points, we obtain v = cv rat for some complex number c and the equation system B sub (z DFT )t sub = 0 becomes equivalent to the polynomial identity e iγ (cũ rat , so that c(e iγ u rat N :N +L−1 + u rat 0:L−1 ) = e iγ u N :N +L−1 + u 0:L−1 . Hence, t sub = ct rat sub and since any scalar multiple of t rat sub is a solution to B sub (z DFT )t sub = 0, the proof is finished. □ Hence, Corollary 2 achieves the lower bounds N ≥ 2L, K ≥ 2L, thereby proving that they are sufficient. Moreover, a direct implication of Corollary 2 is that the 2L × (2L + 1) matrix B sub (z DFT ) has rank 2L.
So far, 2L sampling points z k were constrained to be γshifted DFT points. This can be relaxed with probability 1, so that all z k are arbitary. Theorem 4: Assume N ≥ 2L, K ≥ 3L and z is any sampling vector. With probability 1, all solutions to B(z)t = 0 satisfy t = ct rat , where c is any complex number.
Proof: Theorem 3 shows that for N ≥ 2L and K ≥ 3L, there exists a vector z 2L of K sampling points such that B(z 2L ) has a one-dimensional null space. 2L of these sampling points are γ-shifted DFT points (thus the index 2L in z 2L ), and the rest K − 2L sampling points can be arbitrary. Without loss of generality, assume that the last 2L elements in z 2L are γ-shifted DFT points and start by letting the first K − 2L (arbitrary) sampling points in z 2L equal z 0 , . . . , z K−2L−1 from z. We will show that the γ-shifted DFT points in z 2L can be replaced with z K−2L , . . . , z K−1 while still guaranteeing that B(z) has a one-dimensional null space with probability 1.
Since B(z 2L ) is of dimension K × (3L + 1) and has a one-dimensional null space, this implies that B(z 2L ) has rank 3L. Thus, there exists a 3L × 3L full rank submatrix M (s) in B(z 2L ). Apparently, M (s) is obtained from B(z 2L ) by removing a single column from B(z 2L ) and by choosing a subset of 3L rows from B(z 2L ) with indices {i 0 , . . . , i 3L−1 } and corresponding samples s We will now split the proof into two cases.
Case 1: s only contains sampling points from z 0 , . . . , z K−2L−1 , i.e., it contains no γ-shifted DFT points. In this case, the 2L γ-shifted DFT points in z 2L can be replaced with z K−2L , . . . , z K−1 without changing M (s); hence, in this case, B(z) still has a one-dimensional null space and we are done.
Case 2: s contains a γ-shifted DFT point z 2L r , K − 2L ≤ r ≤ K − 1. Take the row from M (s) that depends on z 2L r and replace z 2L r in that row with the variable z. This transforms the sample s r = s z (z 2L r ) from (18) into s z (z). Because s z (z) is a polynomial of degree N − 1 in z, the determinant det(M (s)) becomes a polynomial d(z) in z of degree N + L − 1. It holds that d(z 2L r ) ̸ = 0 since M (s) is full rank. Note that the other sampling pointsz k in s are roots of d(z) since replacing z with any one of those sampling points makes M (s) a matrix with two identical rows; thus, its determinant becomes 0. Hence, d(z) can be factorized as d(z) =d(z) zj ,zj ̸ =z 2L r (z −z j ) whered(z) is a polynomial of degree N − 2L; thus, d(z) has at most N − 2L additional roots on the complex unit circle (in practice, however, these roots will most probably not be on the unit circle due to the randomness of g l , θ l , 0 ≤ l ≤ L−1). If z is not equal to any of these N −2L roots then det(M (s)) ̸ = 0. Since the sampling points z are not equal to these N − 2L roots with probability 1, it follows that replacing z with any of the other 2L sampling points z K−2L , z K−2L+1 , . . . , z K−1 left in z, say z K−2L without loss of generality, results in d(z K−2L ) ̸ = 0 with probability 1. Hence, replacing z 2L r in s with z K−2L results in that det(M (s)) ̸ = 0 still holds with probability 1. Repeating this replacement process for all γ-shifted DFT points in s brings us to case 1 above and we are done.
□ The results in Corollary 2 and Theorem 4 can be used as a basis to obtain a PoA estimation algorithm for noisy scenarios.

V. POA ESTIMATION ALGORITHM
In presence of noise, the samples s k in (4) are replaced with r k in (3). This transforms B(z) in (18) to C(z) = B(z) + W (z), where the first L + 1 columns in W (z) equal those of B(z) with s k replaced by w k , while the last 2L columns in W (z) are 0. Let z j = [z j 0 . . . z j K−1 ] and (as before) let D(z j ) be the diagonal matrix with z j on its main diagonal. Then Due to the presence of noise, the one-dimensional null space of B(z) is replaced with the one-dimensional space of C(z) that corresponds to its smallest singular value (which is in general non-zero due to the noise). Same transformation happens to B sub (z DFT ) in (31), where it now equals C sub (z DFT ) = B sub (z DFT ) + W sub (z DFT ), with the first L + 1 columns in W sub (z) equal to corresponding columns in W (z) and the last L columns in W sub (z) are zero. For the one-dimensional space in C(z) (or C sub (z DFT )) to approach the correct one-dimensional null space of B(z) (or B sub (z DFT )), we need to reduce the impact of the noise matrix W (z) (or W sub (z DFT )). This is an interesting research direction that is left for future work.
So far, we have focused on the null space of B(z) or B sub (z DFT ). If we choose v 0 = (−1) L and c = 1 (i.e., we are specifically looking for the solution where v(z) has (−1) L as its constant coefficient), then B(z)t = 0 equalsB(z)t = s, whereB(z) equals B(z) but without its (L + 1):th column s andt equals t without its (L + 1):th element v 0 . Hence,B(z) has dimensions K × 3L. The results from the previous section imply that with probability 1,B(z) is full rank and thus invertible; hence, we obtain the solutiont ast =B −1 (z)s. In case z = z DFT , we use the pseudo-inverse ofB(z DFT ), denoted as B + (z DFT ). Thus, in both cases, we can writet =B + (z)s as our solution. Adding noise to the observations results iñ C(z)t = s + w, whereC(z) = (−1) L+1 (B(z) +W (z)) andW (z) equals W (z) but without its (L + 1):th column w. Hence, under noise, we havet =C + (z)(s + w). From these observations we can formulate the following estimation algorithm.

PoA Estimation Algorithm (PoA-DTFT) leftmargin=0cm
Inputs: L, N , {r k , z k = e iπϕ k } 0≤k≤K−1 . Constraints: Outputs: PoA and gain estimatesθ l ,ĝ l , 0 ≤ l ≤ L − 1. Note that the complexity of PoA-DTFT is mainly determined by inverting the K × 3L matrixC(z) followed by finding the roots of the degree L polynomial v(z). For L ≤ 4, it is well known that there are closed form formulas for the roots of v(z) in terms of its coefficients; for larger L, the roots can be found by any popular root searching algorithm. In contrast, ML estimation involves a search across the L-dimensional cube [−1, 1] L as well as the complex gains g l , 0 ≤ l ≤ L − 1, which is of prohibitive complexity even for relatively small values of L.
In the next section, we will demonstrate the performance of PoA-DTFT. Before doing so, however, we use the results derived so far to answer all four questions posed in Section II-B: 1) It is necessary and sufficient with N ≥ 2L antennas.
2) It is necessary and sufficient with K ≥ 2L measurements.

VI. NUMERICAL RESULTS
The simulations in this section assume the model in (2), with the further assumptions that the PoAs and their corresponding gains remain constant during the K DTFT measurements; however, they are allowed to change for the next set of K measurements. We assume that n k ∈ CN (0, N 0 ), 0 ≤ k ≤ N − 1, where received signal SNR is defined as 1/N 0 . Two different performance measures are evaluated: beamforming gain and matching distance.

A. Beamforming Gain
After the DTFT measurements, the receiver constructs a beam u which ideally should be well aligned with the channel h = A(θ)c in (1). This alignment is here measured with the beamforming gain g  (2) which is denoted as L true . phase(x) = x/|x| is a function that returns the complex phase of a complex number x, and it is applied elementwise when x is a vector of complex numbers. Following algorithms are compared Fig. 3. The "PoA-DTFT, L = 2, u = phase(hest)" beam construction provides best performance but no longer converges to optimal performance. The reason for this loss is that it uses only two paths from the channel for estimation while the channel has six paths, so the other four paths become noise that is independent of SNR and thus produces a constant gap to optimum performance. Fig. 4. When N = 32, the ''PoA-DTFT, L = Ltrue, u = phase(hest)" provides best performance after a certain SNR and converges slower to optimal performance than when N = 8. The reason is that a larger N increases the spatial directivity, which requires more accurate PoA estimation and thus a high enough SNR.
i.e., u is the DTFT beam that produces the largest SNR during the measurements. This is a conventional algorithm commonly used in practical beam sweeping implementations and is our benchmark. When L true is known in advance, it is clear that "PoA-DTFT, L = L true , u = phase(h est )" converges to "Optimal beam" since in the noiseless case it estimates h perfectly. In Figure 2, when there are two paths in channel and this is assumed to be known in advance, "PoA-DTFT, L = L true , u = phase(h est )" is able to converge to optimal performance rather quickly; none of the other methods are able to converge to optimal performance. Moreover, it is able to do this even with 4 sweeps, although at significantly higher SNRs as expected. In contrast, "PoA-DTFT, L = 1, u = phase(h est )" performs worse than the benchmark, which shows the importance of having some knowledge about the number of paths in the channel in advance. Figure 3 shows that if L is not perfectly known, because L < L true , then "PoA-DTFT, L = 2, u = phase(h est )" is unable to converge to the optimal beam. However, it still provides the best performance. This shows that the method is still able to estimate the two strongest paths in the channel, while the rest of the paths (four in this case) will act as noise that is independent of SNR which thus produces a constant gap to the optimal beam. Finally, Figure 4 shows the performance when we have a larger ULA of 32 antennas. Clearly, even with as little as 8 sweeps, ''PoA-DTFT, L = L true , u = phase(h est )" converges to the performance of "Optimal beam". However, in this case, the "Max SNR beam" provides best performance at low SNRs. The reason for this is that a larger N increases the spatial directivity of the channel, which requires more accurate PoA estimation and thus a higher SNR to achieve a lower estimation error.

B. Matching Distance
Next, we will show matching distance versus received signal SNR simulations of PoA-DTFT, MPM, beamspace ESPRIT and the two CS algorithms from [6]. As described in [10], matching distance is a measure that quantifies the error of the PoA estimates. To define it, first define the wrap-around distance of the PoA as The matching distance between the true θ and estimatedθ equals where p is taken over all permutations of {1, 2, . . . , L}.
Hence, matching distance aligns estimated and true PoAs and computes the maximum error across the L estimates. It will be assumed that there are T = L transmissions from the transmitter, where the PoA vector θ l and its associated gains g l , 0 ≤ l ≤ T − 1, change between transmissions. The reason for this assumption is because beamspace ESPRIT requires it while the other methods can estimate from a single transmission. Moreover, the true L is assumed to be known here (i.e. L = L true ) since we want to evaluate the estimation of all paths; we showed in Section VI-A that PoA-DTFT also works well when L < L true . In the figures, solution of (6) with A(ϕ) as measurement beams and δ = 1/(10 √ log N N 0 ) is denoted as "CS DTFT", while replacing A(ϕ) with random RM and using δ = 0.01 is denoted as "CS RM". Using other δ values for CS DTFT did not change its performance substantially, while using very low values of δ for CS RM increases its computational complexity drastically (without much performance improvement). Moreover, the grid resolution α in III-C satisfies α j − α j−1 = 1/N , i.e., the step size of the grid is 1/N . Smaller step size results in higher computational complexity as well as worse performance since the stronger paths tend to spread across many values of the grid, while a larger step size obviously results in loss in accuracy. For MPM, as recommended in [12], we use the parameter value L MPM = ⌊N/2.5⌋ as the parameter L used in MPM (to avoid confusion with the number of paths variable L in this work, parameter L for MPM in [12] is here denoted as L MPM ). Figure 5 considers i.i.d. uniformly distributed PoAs θ l = [θ l,0 θ l,1 ] for different values of N and K, i.e, θ l,0 , θ l,1 are i.i.d. and distributed as U ([−1, 1)) for 0 ≤ l ≤ L − 1. The gains g l,0 , g l,1 ∈ CN (0, 1), 0 ≤ l ≤ L − 1, are i.i.d. Rayleigh distributed and the measurement beams are uniformly distributed DFT beams. As seen, the matching distance of PoA-DTFT decreases with SNR, while all other methods have an error floor except for MPM when K = N (as explained in Section III-A). However, when K < N , PoA-DTFT is the only algorithm with an error that decreases with SNR. The error floor of beamspace ESPRIT is severe because it assumes that the PoAs are constant during L transmissions, while the error floor of CS algorithms is due to a finite grid resolution. The CS DTFT algorithm has similar performance to beamspace ESPRIT, while CS RM performs better (in accordance with results in [6]). Moreover, it is seen that larger values of N improve the performance of CS RM because CS algorithms require large N for guaranteeing accurate estimation while the grid size N also becomes large which improves resolution. Increasing N to 128 clearly boosts the performance of CS RM, even improving upon PoA-DTFT up to SNR of 25 dB. In Figure 6, estimation of L = 4 paths Fig. 6. PoA-DTFT also works well for higher L values. The matching distance is worse than when L = 2 since there are more paths to estimate. is shown where similar conclusions are drawn as for Figure 5. Noteworthy is the significant degradation in matching distance due to a larger L since more paths need to be estimated. Figure 7 shows a scenario where the PoAs θ l , 0 ≤ l ≤ L − 1, do not vary much between the transmissions. The PoAs are generated by first drawing elements in θ 0 as i.i.d. from U ([−1, 1)) and then letting θ l,k = mod (1 + θ 0,k + U ([0, ϵ]), 2) − 1 for 0 ≤ l ≤ L − 1, 0 ≤ k ≤ L − 1 where ϵ determines the variation of the PoAs around θ 0 ; in the figure, ϵ = 0.05 which results in small variation across time. The path gains g l are Rayleigh vectors generated by drawing g 0 as a Rayleigh distributed vector and then letting g l,k = √ P g 0,k + √ 1 − P CN (0, 1) for 0 ≤ l ≤ L − 1, 0 ≤ k ≤ L − 1. Clearly, P determines the variation of the gains around g 0 across time, where P close to 0 results in small variation while P close to 1 results in large variation. Beamspace ESPRIT is expected to perform better in this simulation scenario since the PoAs do not vary much and this is indeed confirmed by Figure 7; its performance is even better when the gains exhibit large variation, as expected. However, PoA-DTFT still offers substantial gains at moderate to high SNRs, which shows its applicability to any scenario. As before, MPM performs roughly the same as PoA-DTFT since K = N in this simulation.
Finally, Figure 8 considers the case where the variation of the PoAs is limited to the interval [−0.25, 0.25], i.e., θ l,0 and θ l,1 , 0 ≤ l ≤ L−1, are i.i.d. and distributed as U ([−1/4, 1/4]). As before, reduction in PoA variation improves the performance of beamspace ESPRIT. When the measurement phases are uniformly located within [−1/4, 1/4] as ϕ k = −1/4 + k/(2N ), 0 ≤ k ≤ K − 1, which results in some measurement beams being non-DFT beams, the performance of PoA-DTFT and CS-DTFT improves significantly compared to measuring solely with DFT beams. Hence, although usage of non-DFT beams require more measurements than when only using DFT beams (c.f. Corollary 2 and Theorem 4) and by being closer to each other also reduce the condition number of B(z) in (18), they are however closer to the true PoAs and therefore increase the measurement SNR (after the projection); this improves the  When the variation of the PoAs is limited, performance of beamspace ESPRIT improves. PoA-DTFT with measurement phases ϕ inside [−1/4, 1/4] improves significantly upon PoA-DTFT with measurements being uniform DFT phases across [−1, 1). The reason for improvement is enhanced measurement SNR for phases closer to the true PoAs. Interestingly, MPM improves upon PoA-DTFT for a wide range of SNRs, but saturates at SNRs above 60 dB since K < N . performance at lower received signal SNR 1/N 0 . Interestingly, MPM provides better performance than PoA-DTFT with non-DFT beams in a wide region of SNR, although it saturates for SNRs larger than 60 dB since K < N .

VII. CONCLUSION AND FUTURE WORK
This work has studied estimation of L PoAs from DTFT beam (a.k.a. analog beam) measurements at a ULA with N antennas. First, basic questions about uniqueness of the estimation in the noiseless case were formulated in Section II-B. It was shown that N ≥ 2L and K ≥ 2L are necessary conditions for unique recovery of the L PoAs. The estimation problem is then expressed in terms of sampling a rational function, for which there exists a rich theory. However, direct application of this theory to our rational function requires at least K ≥ N + 2L measurements. Instead, it was noted that our rational function possesses sparsity which can be utilized to reduce the number of measurements from N +2L to 3L for any DTFT measurement beams. Moreover, if DFT beams are chosen as measurement beams, with N as the DFT period, 3L measurements are further reduced to 2L measurements. Hence, with this result, we concluded that N ≥ 2L and K ≥ 2L are necessary and sufficient conditions for recovering L PoAs. In cases when N ≫ L, such as mmWave, these results provide a significant improvement to CS work on the problem which requires at least O(L log(N/L)) samples. The derived noiseless results are easily applied to the noisy case, resulting in a relatively simple PoA estimation algorithm referred to as PoA-DTFT. Numerical simulation of beamforming gain demonstrate the superiority of PoA-DTFT compared to conventional beam construction methods as well as its ability to recover a few of the strongest paths in the channel even when L is not known exactly. Simulations of matching distance illustrate the advantages of PoA-DTFT compared to MPM, beamspace ESPRIT and CS methods from [6]. MPM performance saturates with SNR when K < N , while it gives similar performance to PoA-DTFT when K = N . Beamspace ESPRIT requires at least 2L 2 measurements and suffers from the assumption of constant PoAs during L transmissions, while a problem for CS algorithms is a finite grid resolution as well as requiring a large N for improved performance. In contrast, PoA-DTFT offers infinite PoA resolution, estimation from a single transmission and good performance for any N .
For future work, there are several different directions. An analysis of the performance of PoA-DTFT with respect to parameters such as N , K, SNR, etc. is good to have. As discussed in Section V, a possibly better management of the noise present in PoA-DTFT leaves room for further improvements of the algorithm. Accurate complexity analysis is another aspect to include in future work. Furthermore, extending the developed method to uniform planar arrays is yet another direction.