Variable Data Rate in Optical LEO Direct-to-Earth Links: Design Aspects and System Analysis

—In the frame of ongoing efforts between space agencies to deﬁne an on-off-keying-based optical low-Earth-orbit (LEO) direct-to-Earth (DTE) waveform, this paper offers an in- depth analysis of the Variable Data Rate (VDR) technique. VDR, in contrast to the currently adopted Constant Data Rate (CDR) approach, enables the optimization of the average throughput during a LEO pass over the optical ground station (OGS). The analysis addresses both critical link level aspects, such as receiver (time, frame, and amplitude) synchronization, as well as demonstrates the beneﬁts stemming from employing VDR at system level. This was found to be around 100% compared to a CDR transmission approach.


I. INTRODUCTION
In recent years, optical communications have become increasingly appealing to the space industry. Beyond scientific experiments and demonstrations, new multi-year commercial missions resorting to optical links have emerged. For example, the European data relay system (EDRS) involves two geostationary satellites and tens of thousands of optical links materialized [1]. From low Earth orbits (LEO), direct-to-Earth (DTE) optical communications appeal not only to institutional missions (e.g., to download payload telemetry data of Earth observation missions), but can potentially have a significant commercial application [2].
Currently flying and planned optical LEO-DTE systems transmit at a constant data rate (CDR) while passing over an optical ground station (OGS) [3]. Such a transmit mode does not take into account the fact that the characteristics of the propagation channel may vary significantly during the satellite pass. Indeed, channel impairments like the free space loss, the atmospheric attenuation, and the turbulence of the refractive index of the atmosphere show a strong dependence on the satellite elevation angle [4]. In particular, the atmospheric turbulence, also known as scintillation, is the main source of fading that affects the optical signal [5]. Given such a high channel variability during the satellite pass, the CDR transmission mode requires a trade-off between two desirable but contrasting features: a high transmission data rate and a long satellite visibility window. Indeed, adopting a high data rate would require favourable channel conditions, typically available only at high elevation angles, which implies a reduced duration of the transmission window. On the other P.-D. Arapoglou  hand, a long transmission window requires the link closure at low elevation angles, that only low data rates would allow. Even when this trade-off is optimized, the CDR approach inevitably causes a significant throughput and data return loss compared to the theoretically available channel capacity.
An intuitive way to tackle this issue is a variable data rate (VDR) approach aiming at reducing the performance gap with respect to the channel capacity and, at the same time, allowing a longer visibility window. This can be achieved by splitting the pass of the LEO satellite in predefined sectors, and optimizing the data rate in each of them. The same concept is being already adopted, for example, by the next generation of Copernicus missions to download Earth observation data from LEO satellites on high data rate radio frequency (RF) links. In particular, the VDR concept is based in this case on the variation of the modulation order and of the rate of the forward error correcting (FEC) code [6]. However, in typical optical LEO-DTE waveforms, such as the one currently defined by the Consultative Committee for Space Data Systems (CCSDS) Optical Working group and referred to as Optical On-Off Keying (O3K), the modulation is fixed, hence it cannot be exploited as a degree of freedom to implement the VDR. In addition, the dynamic range offered by the variation of the FEC code rate alone turns out to be quite limited when compared to the link budget variability affecting the optical channel.
The expected benefit in terms of throughput has already been forecast in previous works, mostly under ideal conditions and irrespective of the technological implementation. In [7], an improvement in throughput by a factor of 3 was obtained by assuming an ideally continuous adaptive data rate, without addressing issues related to the receiver design. In [8], similar throughput gains were achieved by ideally varying the symbol period (in a large but discrete set of values), in the presence of signal-dependent noise. A larger gain was shown to be achievable when the receiver is thermal-noise-limited and / or the transmitted power is employed as an extra degree of freedom.
In this paper, a novel way of implementing VDR in optical LEO-DTE on-off keying (OOK) waveforms is proposed, relying on the spreading of data in order to achieve the highest symbol rate allowed by the link budget in each sector of the satellite pass. In essence, the proposed VDR concept relies on transmitting always at a fixed chip rate (hence, with a fixed receiver bandwidth) and adapting the data rate by repeating / spreading each data symbol by a desired factor. Different flavours of VDR are possible [9] but would require modifications of the hardware at both transmitter and receiver sides. For example, when the symbol rate is changed, synchronization has to be reacquired and, for this reason, only a few values of the symbol rate can be adopted, which results in a significant loss of data return. The proposed approach, instead, requires only digital baseband operations and the front-end hardware is not sensitive to the adopted spreading factor. In order to evaluate the average throughput gain brought about by VDR, we calculate the link budget by considering a specific OGS in Southern Italy, for which we assume a receiver employing an avalanche photodiode (APD) for the opto-electric conversion. Another objective of the present study is to provide a system level analysis that helps understanding whether the advantages of VDR are attractive enough to justify the additional digital signal processing and complexity, required especially at the receiver side.
The paper is organized as follows. The general design aspects, that apply to both CDR and VDR systems, are discussed in Section II. In particular, a possible transmitter and receiver architecture for a CDR system is described and a novel frame and timing synchronization procedure is discussed. Section III presents an overview of the proposed VDR concept along with a critical discussion on how to achieve it with appropriately selected spreading sequences. The necessary modifications to the transmitter and receiver architectures for a VDR system are also discussed in Section III. Section IV reports the results of physical layer (PHY) simulations with FEC coding and channel interleaving, used to cope with channel fading, thermal noise, and the shot noise generated by the APD. These results are then used for the link budget computation in Section V, that presents a system level analysis of the gains offered by VDR in terms of average throughput, compared to the performance of CDR transmission 1 . Finally, conclusions are drawn in Section VI.

A. System model
We consider a free space optical (FSO) communication system employing the OOK modulation and assume that the receiver employs an APD [11], so that both thermal and shot noise have to be considered. In particular, the power spectral density (PSD) of the shot noise depends on the transmitted symbol, hence the sum of thermal and shot noise is modeled as a non-stationary additive Gaussian noise process. The received signal after the APD can thus be expressed as where s(t − t 0 ; h, a) and w(t − t 0 ) are the useful signal and the noise process at the receiver, both affected by an unknown delay t 0 introduced by the channel. The useful signal can be expressed as being a = {a k } the sequence of transmitted symbols 2 belonging to the alphabet {0, 1}, T the symbol time, and h an amplitude which is typically unknown as a consequence of the random nature of the scintillation and absorption phenomena characterizing an FSO link. We consider a non-return-to-zero (NRZ) transmission, where the duration of the rectangular 1 Section V extends an investigation that was partially performed in [10]. 2 Since the chosen modulation format is OOK, in the following we will use "bit" and "symbol" as synonyms. shaping pulse p(t), here assumed with unit energy, is one symbol period: The noise process w(t) in (1) can be expressed as i.e., as the sum of the thermal noise w th (t), assumed to be white and Gaussian with two-sided PSD N 0 /2, and of the independent shot noise w sh (t), still white and Gaussian, with two-sided PSD N sh /2 (possibly depending on h). From (4), when a symbol "0" is transmitted, we have thermal noise only with PSD N 0 /2, whereas when a symbol "1" is transmitted, the noise is the sum of w th (t) and w sh (t) with PSD N 1 /2 = (N 0 + N sh )/2. The unknown delay t 0 in (1) can be expressed as the sum of a component multiple of T plus a residual fractional delay τ , i.e., The above expression of t 0 is functional to the procedure adopted at the receiver to estimate the unknown delay. Indeed, the estimation of t 0 is performed in two separate steps, called frame and timing synchronizations, whose order depends on the chosen receiver architecture. In one step, the estimation of k 0 is performed by exploiting proper fields of known symbols, which yields a coarse alignment. In the other step, a timing synchronization algorithm has instead the task of estimating the fractional delay τ .

B. Transmitter and receiver architecture for Constant Data Rate
In order to perform frame and timing synchronization, as well as to estimate the value of h, we resort to a receiver architecture that performs data-aided (DA) estimation, hence we assume that blocks of P pilot symbols are periodically inserted in the transmitted data stream. When the channel coherence time is large, the distance between blocks of pilots can be chosen according to a maximum allowed overhead. Otherwise, the number of pilot fields and the distance between them has to be properly designed by addressing a trade-off between estimation accuracy and overhead minimization.
The transmitter architecture is shown in Fig. 1. The codewords at the output of the FEC channel encoder are interleaved by using a proper convolutional interleaver [12]. Interleaving is required since atmospheric turbulence is a slowly varying phenomenon with a very long coherence time. Pilot fields in blocks of P bits are then inserted and the resulting bit stream is then modulated using an OOK modulation with NRZ pulses.
The receiver architecture is reported in Fig. 2. After the APD (and the transimpedance amplifier, TIA, not reported in the figure), we assume that a matched filter (MF) is present. In   general, we assume that N samples per symbol are extracted and processed at the output of the MF. Frame synchronization is performed first, by searching the correct alignment with the pilot fields. 3 DA timing synchronization is performed next, together with the estimation of the unknown amplitude h. The number of samples is then reduced, from N to only one sample per symbol, by interpolating the samples processed by the receiver. After pilot removal, the log-likelihood ratios are computed, deinterleaved, and passed to the decoder.
We seek an algorithms able to perform DA frame and timing synchronization, with a trade-off between performance and complexity that privileges the simplicity of the receiver. As it is intuitive, at very low levels of received power P avg , it is the thermal noise that dominates over shot noise. This conclusion can be confirmed not only by numerical simulation but also by the computation of theoretical bounds (such as the modified Cramér-Rao bound, MCRB), showing that the values computed with or without shot noise practically coincide when P avg is very low (below −35 dBm, for the system parameters used in this work). 4 Since this is indeed the low-power regime at which we expect the receiver to operate, we shall derive a synchronization algorithm under the assumption that only thermal noise is present. In any case, performance will be assessed, in the numerical results that follow, in a realistic scenario with both thermal and shot noise.
The receiver architecture described above has a major disadvantage: when the clock frequency is not stable and significantly changes over time or drifts due to the motion of a LEO satellite, a DA timing estimation algorithm is not able to track these variations if, between two consecutive CSMs, they produce a slip of a symbol period (or a chip period in case of VDR, see Sec. III). The instability of the clock frequency is in practice a minor problem since sufficiently stable oscillators are nowadays available. As far as the motion effects are concerned, if the satellite knows its relative position with respect to the ground station, timing drifts can be precompensated at the transmitter. Otherwise, the only alternative is the use of 3 In the CCSDS terminology, the pilot fields are called Codeword Synchronization Markers (CSMs) and two consecutive CSMs are separated by a number of bits corresponding to a codeword. We assume that the channel coherence time is such that the channel can be considered as constant at least during a codeword length. Although, in general, more CSMs can be employed to perform DA synchronization, we shall instead assume that the channel estimate, obtained by using a single CSM, remains valid only for a codeword length. 4 Such a theoretical investigation is however beyond the scope of the present work and will not be further addressed. a closed-loop non-data-aided (NDA) timing synchronization algorithm (possibly of the second order) able to track these variations.
We thus also consider a different receiver architecture, shown in Fig. 3, in which timing synchronization is performed in closed-loop NDA mode prior to any other receiver function. In particular, we will assume, for complexity reason, that timing synchronization is performed by using at most N ≤ 2 samples per symbol. After timing synchronization and interpolation, the remaining functions are performed by using only one sample per symbol. In particular, the alignment with the pilot fields (i.e., frame synchronization) is performed jointly with amplitude estimation in DA mode, by using, for example, the algorithms described in [13]. After pilot removal, the log-likelihood ratios are then computed, deinterleaved, and passed to the decoder, as in the previous case.
Even in the case of NDA timing synchronization, we can adopt algorithms that were derived under the assumption that shot noise is absent. The same considerations indeed apply as in the case of DA timing synchronization, since receivers operate at very low values of P avg where thermal noise is dominant over shot noise.

C. Frame and timing synchronization
Suppose to observe a chunk of the continuous-time received signal (1), with support [t 0 ,t 0 + LT ], where L = P is chosen equal to the length of the pilot sequence. 5 The valuet 0 is a tentative value of the actual channel delay t 0 , that is assumed by the receiver and can be expressed ast 0 =k 0 T +τ . We assume that the symbols in the observation window are known, as is the thermal noise PSD. On the contrary, the amplitude h is unknown and will be jointly estimated with timing; its tentative value at the receiver is denoted byh. The likelihood function for the joint estimation of t 0 and h, under the assumption that the shot noise is negligible, is [14] where is the signal at the output of the MF. More precisely, n(t − t 0 ) is the filtered noise while g(t) is the triangular autocorrelation function of the rectangular transmission pulse p(t). The maximum value of g(t) is equal to 1, since we assumed p(t) with normalized energy, then g(t) linearly decays on both sides of the time origin. By defining the set of indices K 1 = {k ∈ {0, 1, 2, . . . , L − 1} : a k = 1}, corresponding to bits "1" in the sequence of L known symbols, and by K 1 = |K 1 | their total number, (6) can be compactly expressed as Samples {x(k 0 T + kT +τ )} can be obtained by sampling the matched filter output, with symbol spacing and with a tentative delayt 0 =k 0 T +τ . Defining their sum over K 1 as the objective is to maximize it over the possible values of the delay, hence to estimate k 0 and τ as Hence, the following estimate for the attenuation h, ensures that the likelihood function in (8) is maximized too. Note that there is an implicit ambiguity in the maximization in (10), since the choice for the pair (k 0 ,τ ) is equivalent to the choice, e.g., for (k 0 − 1,τ + T ). This ambiguity is solved by constraining τ in an interval with duration T , such as [0, T ) or [−T /2, T /2). The procedure for frame and timing synchronization is performed in two separate steps. Recall that N samples per symbol are extracted at the output of the MF, which correspond to N different hypotheses for the values ofτ . Namely, N specific valuesτ +ñ T N (withñ ∈ {0, 1, · · · , N − 1}) are considered at the receiver. The frame synchronization step of the algorithm consists in a coarse search, in which we look for the maximum of Γ in a sliding window fashion. The search is performed with the constraint thatτ takes one of the N considered values, hence the metric Γ is computed according to (9) by using L samples with spacing T . 6 The grid of L samples is then shifted in time by T /N at every step, by increasing the value considered forñ (or otherwise increasingk 0 and resettingñ to zero, whenñ = N ). The frame synchronization step can be terminated by declaring that the alignment is found when the likelihood function exceeds a properly optimized threshold. When the alignment is declared, a verification step can be implemented by looking for other maxima in correspondence to the next pilot fields. This will help reducing the false alarm probability while the miss-detection probability can instead be reduced as much as required, by increasing L or otherwise by waiting for a sufficiently long time, since pilot fields are periodically inserted in the continuous transmitted stream. The symbols belonging to a pilot field need to belong to a sequence whose autocorrelation shows a clearly isolated maximum at the time origin. A very good choice, in this respect, is represented by the maximum length sequences or M-sequences [15], [16].
The frame synchronization step yields not only the estimatê k 0 for the frame index but also a coarse timing estimate for τ , corresponding to the selected valuen forñ: we denote it by τ 1 =τ +n T N . If the frame synchonization is effective, then the residual error τ 1 − τ lies within a sample interval, hence we can define a residual relative error ε = ( , and the metrics in (9) can be equivalently expressed with a single argument, as (12) where (7) is used to express the matched filter output x(t).
Considering that g(t) extends in the interval [−T ; T ], only two terms can contribute to the inner summation in (12), namely the symbols a k±1 next to a k . Hence, where n 1 (t) = k∈K1 n(t + kT ) is the sum of filtered noise samples affecting the "1" symbols and we defined g 11 (t) = g(t + T ) + g(t) + g(t − T ) as the sum of three adjacent autocorrelations (which is trapezoidal, in our case). In (13), we define K 11 as the number of "11" subsequences in the known pilot field. The noiseless part of the metric Γ(ε) in (13) can be computed, based on the autocorrelation g(t) and its threefold version g 11 (t), as a function of the normalized error ε. Since the autocorrelation g(t) is more peaked with respect to its smoother summed version g 11 (t), then it is easy to see that, for the purpose of timing estimation, a good pilot sequence is such that K 1 is much larger than K 11 , so that the resulting metric Γ(ε) reaches a sharp maximum around the zero of timing error ε.
Let us shortly denote by Γ 0 = Γ(k 0 , τ 1 ) the maximum of the metric in (9) that stems from frame synchronization, and by Γ ± 1 N the two neighbouring samples, corresponding to the metric computed at indices (k 0 , τ 1 ± T N ). As stated, if frame synchronization is correct, then the time-continuous residual error εT lies in an interval [−T /(2N ); T /(2N )] with amplitude T /N . Let us suppose to have operated the frame synchronization algorithm with N ≥ 2 samples per symbol interval. Γ 0 thus corresponds to the estimated start of the frame while Γ − 1 N and Γ 1 N are its neighbouring values, spaced by a fraction of the symbol period on both sides of Γ 0 (clearly, it is Γ 0 > Γ ± 1 N ) and we shall use the three values for the estimation of the residual timing error. Given the triangular profile (14) of Γ(ε) within (−1 ≤ ε ≤ 1), N ≥ 2 samples per symbol ensure that, no matter where the maximum sample Γ 0 is located within the interval ε ∈ [−1/2N ; 1/2N ], the following linear interpolation yields the maximum of (14): This hence defines the estimate for the residual relative timing error. Note that in the case of a VDR system, timing estimation via (15) can be accomplished even by using N = 1 sample per symbol, provided that proper spreading sequences are employed, as further discussed in Sec. III-B. The estimate of τ is finally obtained aŝ The maximum of the metric in (14) can be equally expressed in an easy way, resorting to the triangular shape of Γ(ε), in terms of the three samples above, and, from this, the amplitude estimation can be found from the general relationship (11): Besides the DA timing and amplitude estimation described above, the receiver can adopt a NDA timing estimation that is performed first, according to the architecture reported in Fig. 3. Frame synchronization, i.e., the alignment with the pilot sequence (CSM), as well as amplitude estimation can be performed at a subsequent stage, after interpolation and downsampling, by using only one sample per symbol [13]. For the timing synchronization, one of the traditional algorithms proposed in the literature can be used. In the numerical results described in Sec. IV, we compare the NDA earlylate detector (ELD) technique with that proposed by Gardner [14], both achieving good performance in the considered system scenario. Note that we only consider schemes based on digital signal processing, neglecting highly suboptimal clock recovery schemes that are based on an analog circuitry, as those described in [17], [18].

III. VARIABLE DATA RATE SYSTEM DESIGN
A. The variable data rate concept The proposed VDR approach is based on the assumption that the OOK modulation scheme is used. OOK is one of the most common modulation formats adopted in optical systems due to its simplicity, and it is particularly suitable for low-complexity optical LEO-DTE communications [19]. In addition, OOK is the modulation format of choice in the ongoing CCSDS standardization of the optical LEO-DTE physical layer, referred to as O3K [20]. In particular, to cater for a variety of system configurations, the O3K standard allows for a wide set of symbol rates, ranging from around 1.2 Msym/s to 10 Gsym/s.
Despite such a wide range of symbol rates, the current concept of operations (ConOps) for optical LEO-DTE links is to select only one of them and transmit at a CDR during the pass over the OGS. This approach, as discussed in the introduction, is clearly suboptimal. Instead, the VDR-based ConOps relies on splitting the pass in a predefined set of sectors, and selecting the optimal symbol rate for each of them in a pre-programmed manner. The symbol rate is thus selected so as to close the link under the different channel conditions experienced during a pass. The available channel capacity is thus incrementally exploited as the channel conditions improve.
The proposed method to seamlessly change the symbol rate during the pass is to spread the data symbols to the highest possible chip rate the transmit or receive hardware (HW) can support. Since both transmitter and receiver always operate at a constant chip rate, the VDR is implemented by changing the spreading factor in each sector of the pass, so that the underlying symbol rate matches the selected symbol rate for each sector.

B. Selection of the spreading sequences
The VDR technique foresees the use of spreading sequences to represent the bits "0" and "1". The chip rate is kept constant whereas the symbol rate is decreased by increasing the length of the spreading sequences. In other words, we may express the transmitted signal as s 1 (a), . . . , s M −1 (a)] T is the spreading sequence associated with bit a ∈ {0, 1}, composed of binary symbols belonging to the alphabet {0, 1}. M is the length of the spreading sequence, T c is the chip time, and p(t) is a rectangular pulse with unit energy and duration T c , i.e., Regarding the length M of the spreading sequence, it is assumed that the possible values are M = 2 0 , 2 1 , 2 2 , . . . We assume perfect synchronization at the receiver. The signal at the output of a filter matched to the pulse p(t) is sampled at time instants (iM + )T c obtaining the samples where h is the channel attenuation, taking into account the atmospheric turbulence, and {n iM + } are zero-mean independent random variables with variance σ 2 (s ) = σ 2 0 (1 − s ) + σ 2 1 s , where σ 2 0 = N0 2 and σ 2 1 = N1 2 . These samples will be used to compute the log-likelihood ratios (LLRs) to be sent to the decoder. The LLR for symbol a i can be computed as , as the sum of the LLRs associated with each chip.
The first problem to be solved is related to the selection of the spreading sequences. From (19), we can easily observe that, if for some we select s (0) = s (1), the corresponding contribution of x iM + to the LLR will be zero and this will produce a performance loss since that sample is not exploited for detection / decoding. Hence, the two spreading sequences corresponding to "1" and "0" must be complementary and we can simply provide one of them (for example the one corresponding to the bit "1"). We will select s (1) = s and s (0) = s = 1 − s , i.e., s is the bit complementary to s . With this choice, the LLR can be expressed as .
Considering the fact that the LLR is given by the sum of independent contributions, we can also state that the performance of a pair of spreading sequences (that for bit "1" and the complementary spreading sequence for bit "0") will depend on the number M 1 (0 ≤ M 1 ≤ M ) of chips "1" in the spreading sequence corresponding to bit "1", and will be independent of their position. Our aim is thus to select the value of M 1 which maximizes the performance. Without loss of generality, we will assume that the spreading sequence corresponding to bit "1" is as follows As mentioned, when transmitting the bit a i (where a i can be either "0" or "1"), the samples {x iM + } can be expressed as (18) where the variance of n iM + is σ 2 0 [1−s (a i )]+σ 2 1 s (a i ). We thus have a memoryless channel with input a i and vector output Considering that we foresee the use of a capacity achieving error correcting code, we look for the value of M 1 which maximizes the mutual information where the differential entropies h(X) and h(X|A) are defined as 2 is the a-priori probability of the input symbols, p(x i |a i ) is the conditional probability density function (PDF) of the output given the input, and p(x i ) is the output PDF. The entropy h(X|A) can be computed in closed form and it can be easily verified that it is independent of M 1 . Thus the problem can be restated as the search for the value of M 1 which maximizes the entropy h(X). The pdf p(x i ) can be expressed as where, considering the structure of s(0) and s (1) p(x i |a i = 0) = 1 Unfortunately, the entropy h(X) cannot be expressed in closed form. However, it can be numerically computed for any value of σ 2 0 , σ 2 1 , and h, resorting to the following procedure.
Considering that λ(a i ) is an alternative sufficient statistic, the data processing inequality [21] ensures that I(A; X) = I(A; Λ). This can be easily demonstrated considering that we can compute I(A; Λ) as i.e., through a time average over a transmitted sequence of proper length K. 7 The conclusion is that for the values of σ 2 0 , σ 2 1 , and h at hand, and in the range of values of P avg of interest, we always found that the optimal value of M 1 is M 1 = M (or equivalently M 1 = 0). The best possible spreading sequences to be associated with bits "0" and "1" are thus This choice is conceptually equivalent to reducing the symbol rate by enlarging the transmission pulse duration by a factor M . As a consequence, the duration of each coded bit is M times longer even at the output at the matched filter, so that the profile of the metrics in (13) In the following, we will refer to a scheme employing the spreading sequences (21) as "scheme 1", whereas a scheme using the spreading sequences (22) will be referred to as "scheme 2".

C. Transmitter and receiver architecture for Variable Data Rate
In the case of VDR, the transmitter architecture is depicted in Fig. 4. Compared to the one considered in Sec. II-B for CDR, a spreading of the coded bits is present, as discussed in Sec. III-B. Since the larger the value of M , the lower the value of P avg , it is intuitive that in the case of spreading we will need more pilots to perform a proper estimate of the unknown parameters. On the other hand, if we apply the spreading to the pilot fields too, we could destroy their autocorrelation properties, as will be discussed in the following.
At the receiver side, we report two possible architectures, depending on the way in which timing synchronization is performed. In particular, the VDR receiver architecture based on DA timing synchronization is shown in Fig. 5. Compared to the DA receiver architecture for CDR in Fig. 2, there is an estimation of the spreading factor, performed jointly with pilot alignment, and the LLR computation is performed by using (20). Regarding the number N of samples per chip, given the increased sample frequency due to the spreading, we shall assume at most N = 2 in the following. A VDR receiver architecture envisaging NDA timing synchronization is shown in Fig. 6. As discussed in Sec. II-B, although timing synchronization is performed in closed-loop NDA mode, the other parameters are estimated in DA mode using pilots.
Let us address again the choice of the spreading factor, this time not from the point of view of the bit error rate (BER), but from the point of view of the estimation performance. In case of adoption of the NDA architecture, as already stated, the best choice for the spreading sequence is represented by the "scheme 2". On the other hand, this spreading sequence, if applied to a CSM with the aim of obtaining a longer CSM, will destroy the autocorrelation properties of the M-sequence. As a consequence, in this case it is better to avoid spreading the CSM. On the contrary, when the payload is spread by a factor of M , we need to adopt a different CSM with length LM , where L is the length of the CSM in the absence of spreading. At the receiver side, after NDA timing estimation, interpolation, and downsampling, we have to perform the alignment with pilots (i.e., frame synchronization), amplitude estimation, and the estimation of the spreading factor. In a LEO-DTE link, the spreading factor employed will be a deterministic function of the elevation angle. Hence, we can imagine that the OGS adopts a geometry-based method [22] to predict the employed spreading factor with, at most, an uncertainty between two possible adjacent values. Assuming that the value of the employed spreading factor is M , the uncertainty at the receiver can thus be between M and 2M . The receiver has to evaluate the correlation with both CSMs of length M and 2M . This will allow the receiver to perform both frame synchronization and, implicitly, the estimation of the spreading factor. Once frame synchronization has been performed, DA amplitude estimation can be carried out. A possible way to avoid multiple correlations is the adoption of properly designed CSMs, such that the CSM of length LM coincides with the first half of the CSM of length 2LM . In this way, it would be sufficient to correlate the received signal with the longer sequence and, from the obtained maximum value of the correlation, one could understand if the alignment is obtained with the shorter or the longer sequence. The details of this procedure are omitted for brevity.
Let us now consider the DA architecture in Fig. 5. In this case, considering that the different spreading sequences have a limited impact on the BER performance, and since timing synchronization is performed in DA mode using the CSM, we can generate a longer CSM by spreading the original CSM that is used in the absence of spreading. In order to simplify the transmitter, the same spreading sequences can then be used for the payload too. We thus wish to investigate the impact of different spreading sequences, as applied to obtain a longer CSM, on DA frame, timing, and amplitude estimation. We shall consider the following scenarios: • Scenario 1: In the presence of spreading, with a spreading factor M , we obtain a longer CSM by spreading with  Fig. 6: VDR receiver architecture for NDA timing synchronization.
"scheme 1" the CSM sequence, which in the absence of spreading is of length L. • Scenario 2: In the presence of spreading, with a spreading factor M , we obtain a longer CSM by spreading with "scheme 2" the CSM sequence, which in the absence of spreading is of length L. • Scenario 3: In the presence of spreading, with a spreading factor M , we obtain a longer CSM by spreading the CSM sequence, which in the absence of spreading is of length L, with an M-sequence of length M and its complementary sequence. 8 • Scenario 4: In the presence of spreading, with a spreading factor M , we obtain a longer CSM by using a new M-sequence with length L = (L + 1)M − 1, instead of the length L that was used in the absence of spreading. The advantage of the first three schemes is that different CSMs need not be stored at both transmitter and receiver. In the numerical results in Sec. IV we shall compare the four scenarios above.

IV. LINK LEVEL NUMERICAL RESULTS
The received signal amplitude h and the noise PSD values, introduced in the system model of Sec. II-A, are related to physical system parameters [23]. In particular, since the OOK transmitter uses intensity modulation, we assume that the average received optical power P avg implicitly accounts (among others) for the attenuating effect of turbulence. When a bit "1", is received, the receiver APD, with responsivity R and multiplication factor M , yields an average current equal to 2P avg RM . We set h = 2P avg RM T c so as to account for the gain of the normalized transmission pulse (3). The one-sided PSD of thermal noise is where i th is the thermal current density, that is in turn related to the APD's load resistance R L and to the receiver temperature T 0 through the Boltzmann's constant (k = 1.3806·10 −23 J/K). The one-sided PSD of shot noise can be expressed as [23] N sh = 4eM 2 F RP avg  where F is the APD excess noise factor and e is the electron charge (1.60217662 · 10 −19 C). 9 In the numerical results that follow, we use a chip rate of 10 Gchip/s and the APD parameters in Table I. Considering the receiver architecture in Fig. 5 for VDR transmission in the case of DA timing synchronization, we evaluated its performance. Regarding the performance of frame synchronization, we first considered its evaluation in terms of miss-detection probability, defined as the probability that the timing estimation error exceeds T /2, i.e., P (|τ −τ | > T /2). Fig. 7 reports the results for the proposed algorithm, by  first considering a system without any spreading sequence, where a CSM with length L = 511 is employed. Performance is compared to that of a VDR system where a spreading factor M = 16 is adopted and the four different scenarios described in Sec. III-C are considered for the spreading strategy. A significant gain of more than 6 dB is observed, in scenarios 3 and 4, with respect to the case where no spreading is present. On the other hand, scenario 2 provides the worst performance within the VDR framework. Indeed, in this case the adopted spreading sequences destroy the autocorrelation properties of the M-sequence employed as CSM, as expected.
Moving to timing synchronization, Fig. 8 reports the estimation results in terms of normalized timing mean squared error (MSE) versus P avg . The four scenarios are again considered in the case of a spreading factor M = 16. In this case, scenario 1 is the worst, as expected. Indeed, the number of 0 → 1 and 1 → 0 transitions is the same as in the absence of spreading, so that a spreading gain cannot be expected. On the contrary, scenario 2 is the best one since, in this case, the number of transitions is maximized. Finally, all four scenarios are equivalent in terms of amplitude estimation, as shown in Fig. 9. As a conclusion, taking into account both timing and frame synchronization, as well as amplitude estimation, scenarios 3 and 4 must be preferred.
If synchronization is instead performed in a NDA fashion, we can consider the simple receiver architecture in Fig. 3 for CDR transmission, where timing estimation can be accomplished by traditional NDA algorithms that do not require the knowledge or the estimate of the amplitude h. Fig. 10 shows the performance of two such algorithms in terms of normalized timing estimation MSE versus P avg . More precisely, the NDA ELD and the NDA technique proposed by Gardner [14] are compared, showing an almost identical performance, when the normalized equivalent bandwidth is set to B eq T = 10 −3 . Along with simulation results, Fig. 10 shows the MCRB 10 computed for the considered system, so as to highlight the margin of the achieved performance with respect to the theoretical limit.
As a further step, we analyzed and compared the error rates of CDR and VDR systems, focusing in particular on the impact of the spreading factor M , where M = 1 can be seen as a degenerate case of VDR coinciding with a CDR system. Fig. 11 report the BER performance versus P avg obtained by using a FEC code with rate 0.46, when employing the two spreading schemes introduced in Sec. III-B and different values of the spreading factor M . A convolutional interleaver and the time series of the atmospheric turbulence discussed in Sec. V-A were employed. It can be observed that, while the two spreading schemes deeply affect DA frame and timing synchronization procedures (as seen above in their related scenarios 1 and 2), they have no real impact on the BER performance, that is little affected by the repetition (scheme 1) or alternation (scheme 2) of the information bits. Hence, although in principle the spreading sequences can be optimized, in practice the performance gain this optimization can provide in terms of BER is negligible. For a given average received optical power, it is of course the spreading factor that strongly influences the BER. The corresponding BER performance along with the theoretical reference of the average mutual information are is reported in Fig. 11 for a serial Turbo code [25] of code rate 0.46.
V. SYSTEM LEVEL ANALYSIS The purpose of the following system analysis is to highlight the significant benefits in terms of average throughput that stem from employing VDR instead of CDR. To this purpose, a discussion on the channel characteristics is necessary, so that a detailed link budget computation can be provided for a realistic scenario. A similar system analysis was partially presented in [10].

A. Channel modeling and link variability
The first fundamental step for a system-level assessment is modelling the LEO-DTE optical channel. In particular, its dependency on the satellite elevation angle has to be accurately captured, since it is the key feature exploited by the VDR. The optical LEO-DTE link is assumed to be in cloud-free lineof-sight (CFLOS) conditions. This assumption is not overly restrictive, since clouds may easily block optical signals, and OGS site diversity is needed to avoid the link disruption [26]. For simplicity, and as it is not expected to be an elevation dependent effect, pointing errors were not considered. Under these assumptions, the main atmospheric effects that degrade the LEO-DTE link are: • atmospheric attenuation / transmittance; • atmospheric turbulence due to changes in the refractive index; • single mode fiber coupling loss due to the atmospheric turbulence degrading the spatial coherence of the laser beam. The atmospheric turbulence induces a fading on the optical signal. Concerning the fading statistics, in this paper the weak turbulence regime is used even for low elevation range, where the irradiance distribution may be approximated by a lognormal model [27]. 11 To generate representative fading time series to be used in Sec. V-B, the synthesizer described in [10] was used. It is based on a low-pass Butterworth filter, properly scaled and tuned to match the desired fading statistics.
To understand the motivations for adopting a VDR transmission scheme, it is important to grasp how the channel effects may vary with the elevation angle. Table II Table II, it is clear that the dependency on the elevation angle of the channel characteristics yields a total dynamic range of 29.3 dB (for this particular link). This very wide dynamic range is what the VDR aims at exploiting in order to bring the system throughput closer to its capacity bound by optimizing the symbol rate in each sector.

B. Link budget
To evaluate the average VDR and CDR system throughput during the pass, a link budget for each of the six sectors was calculated and reported in Table III. To derive the required sensitivity thresholds, the selected waveform was based on OOK with a serially concatenated convolutional code (SCCC) with rate 0.46 as FEC code, and a convolutional channel interleaver with duration 450 ms. Further details on the waveform and on the interleaver sizing can be found in [25]. The sensitivity thresholds required to decode the received signal were set equal to the corresponding theoretical thresholds of the additive white Gaussian noise (AWGN) channel, plus a 3 dB margin. This margin was proven in [10] to be sufficient to accommodate for the residual power dynamics resulting by the combination of fading and channel interleaving. On top of this 3 dB margin, an additional 2 dB margin was included to account for the implementation losses in real hardware. Finally, considering the optical nature of the link and the a priori selection of the symbol rate for each sector, another margin of at least 3 dB is assumed by the link budget in order to account for various link uncertainties. The link budgets per sector resulting from these assumptions are presented in Table  III, where the effects of FEC coding and channel interleaving are included. The link budgets also yield the selection of the symbol rates and the offered data rates. As concluded in [10], a sufficiently long channel interleaver recovers the scintillation loss (apart from the 3 dB margin in the symbol rate selection), which is therefore set to 0 dB in Table III.

C. Average system throughput results
Using the sector link budgets in Table III, it is possible to compute and compare the average system throughput for both CDR and VDR. To optimize the CDR system throughput, Table IV presents six results for the CDR scheme. Each of these results is computed assuming that the transmission starts in a different sector, and that the link does not close in the preceding ones. Since CDR adopts one single symbol rate during the whole pass, it essentially trades a lower visibility time with a higher average throughput. Indeed, considering the six corresponding rows of Table IV, the best throughput for the CDR is obtained by initiating the transmission at sector 5 and transmitting at 1150 Mbps (as shown in column 5 in Table III) for 2/6 of the pass (i.e., during sectors 5 and 6). Alternatively, the same throughput can be achieved by starting at sector 6 and transmitting at 2300 Mbps (as shown in column 6 in Table III) for 1/6 of the pass. Both options result in an average throughput over the full pass equal to 383.3 Mbps. The last row of Table IV shows the average throughput for the VDR, that was evaluated simply by averaging the offered data rates across the sectors, yielding 766.7 Mbps. If compared to the best data return provided by the CDR scheme, it is clear that the VDR approach provides an improvement of 100 % in terms of data return.

VI. CONCLUSIONS
The paper has addressed in depth the VDR technique as a means to optimize the data return of optical LEO DTE links employing OOK with an APD-based receiver over the whole elevation angle range. This was done by investigating both critical link level aspects, such as receiver synchronization performance, but also system level aspects, including channel modeling and link budget. Concluding, we have found that VDR can offer average throughput improvements of 100% compared to CDR. It also allows to maximize the link transmission time by minimizing the starting elevation angle during the pass, as well as offering great flexibility to next generation optical LEO-DTE.