On Time-Frequency Synchronization in LoRa System: From Analysis to Near-Optimal Algorithm

—This paper deals with time and frequency synchronization in LoRa system based on the preamble symbols. A thorough analysis of the maximum likelihood (ML) estimator of the delay (time offset) and the frequency offset shows that the resulting cost function is not concave. As a consequence the a priori solution to the maximization problem consists in exhaustively searching over all the possible values of both the delay and the frequency offset. Furthermore, it is shown that these parameters are intertwined and therefore they must be jointly estimated, leading to an extremely complex solution. Alternatively, we show that it is possible to recover the concavity of the cost function, from which we suggest a low-complexity synchronization algorithm, whose steps are described in detail. Simulations results show that the suggested method reaches the same performance as the ML exhaustive search, while the complexity is drastically reduced, allowing for a real-time implementation of a LoRa receiver.


I. INTRODUCTION
In the recent years, the Internet of things (IoT) has enabled the connectivity of various devices, offering digital transformation across industry verticals [1]. Among the different technologies, the low power wide area networks (LPWANs) have been drawing much attention [2], [3], as they provide features such as high battery life, low data rate, long range, and low communication fees. These features offer a high potential for sensor data collection and applications in the areas of smart metering, smart cities, environmental monitoring, asset tracking and many more. Among the LPWA solutions, the LoRa (Long Range) system has emerged as one of the most deployed technology, using non licensed spectrum resources [4]. This technology, originally developed by Semtech, relies on the so-called chirp spread spectrum (CSS) modulation technique [5], [6]. The resulting waveform has been analyzed in the time and frequency domain in [7]. The CSS signal requires an accurate time and frequency synchronization at the receiver side in order to recover the transmitted information correctly [5]. The performance of LoRa modulation scheme in terms of symbol and bit error rate has been analyzed in [8]- [10].
The first reference to time-frequency synchronization for chirp signals has been patented in the late 1990s [11], i.e. well before the release of LoRa technology. Based on the discrete Fourier transform of the received signal, it is less complex than an estimator based on the cross-correlation. For this reason, V. Savaux, C. Delacourt, and P. Savelli  it has been reused in most of the methods in the literature dealing with synchronization in LoRa [12]- [15]. In [12], [14], the authors suggest to sequentially estimate the integer and the fractional contributions of the time and frequency offsets, based on the whole LoRa preamble, whereas it is proposed in [15] to track the oscillator fluctuations by means of the demodulated symbols within a LoRa frame. To achieve a lowcomplexity algorithm, the authors in [14] suggest to process the received signal at Nyquist rate while [12] investigates the effect of the oversampling rate. Although the method in [14] is less complex, it may be limited when the signal to noise ratio (SNR) is weak (e.g.≤0 dB) due to the Nyquist rate processing.
In this paper, we rather take advantage of the oversampled LoRa signal to improve the synchronization performance, and in the same time we focus on an algorithm that quickly converges to the optimal solution in order to reduce the complexity of the synchronization process. To this end, a detailed theoretical analysis of the behavior of the frequency domain maximum likelihood (ML) estimator of the delay and frequency offset is developed for oversampled LoRa signals, based on a limited number K symbols of the preamble. Ideally, to reduce the complexity, K is chosen as a power of 2. To the best of our knowledge, such a study has not been proposed in the literature about LoRa signal [7], and is an original added value of the paper. Furthermore, we keep the analysis general enough so that it can be applied to any oversampling factor ≥ 2 and any size of preamble.
From the theoretical analysis, we show that the a priori solution to the ML estimator consists in an exhaustive search over all the possible values of both the delay and the frequency offset, leading to an extremely complex synchronization process. This is mainly due to the intertwined unknown parameters, such as stated in [12]. However, based on the theoretical results, we also deduce a low complexity estimator that quickly converges to the optimal solution. To improve the reproducibility of our result, we provide all the detailed steps of the algorithm as well as the corresponding source code. Simulations results reveal that the suggested algorithm reaches the ML exhaustive search, while allowing for a possible real-time software-based synchronization and demodulation process, even in condition of large spreading factors (SF).
The rest of the paper is organized as follows: Section II presents the received LoRa signal and introduces the ML estimation method for both the delay and the frequency offset. Section III is dedicated to the theoretical analysis of the behavior of the ML algorithm, and a low complexity synchronization algorithm is provided in Section IV. Simulations results are presented and discussed in Section V, and we draw our conclusions in Section VI.
Notations: Boldface letters a and normal font letters a represent vectors and scalars, respectively. DF T (a) is the discrete Fourier transform (DFT) of the vector a, a • b is the Hadamard product of two vectors where a and b share the same size. Moreover, {a} gives the real part of a complex scalar a.

II. PROBLEM STATEMENT
This section deals with a detailed presentation of the LoRa CSS signal with a focus on the preamble. Moreover, basics on DFT-based ML synchronization are introduced as well. The LoRa modulation is based on the CSS, which is a constant modulus modulation therefore power efficient. Furthermore, the different spreading factors allow to adapt the robustness of the LoRa signal to the propagation conditions (path loss, interference, etc.) to the cost of a reduction of the data rate.

A. System Model
A LoRa frame consists in a preamble, a header, and a payload, as illustrated on the Fig. 1. In the following we focus on the preamble as it is used for detection and synchronization purposes, whereas the header and the payload carry binary information for control and data, respectively. The way the binary stream from medium access control (MAC) layer is modulated on chirps is out of the scope of this paper, but numerous papers (e.g. [7], [12], [16]) give details about the LoRa modulation. As depicted in Fig. 1, the preamble is composed of K+4.25 symbols, where K can be set by the operator (K = 8 is usually the default value). The K first symbols, denoted by s i (t) (in continuous time representation), i = 0, 1, .., K − 1 and t ∈ [iT s , (i + 1)T s ], are defined as where B w is the bandwidth of LoRa signal, typically 125 kHz (but 250 kHz and 500 kHz are available as well), T s is the symbol duration such that B w T s = 2 SF , and φ is a constant phase (usually omitted). The parameter SF is called the spreading factor and is taken among {7, 8, .., 12}. The K symbols are followed by two symbols that carry the "sync. word", corresponding to the network identifier. The remaining is composed of two inverted phase symbols (called downchirp compared to the K first symbols called upchirp), and a guard period of one quarter symbol period. In practice, a sampled version of s i (t) must be considered. Thus, we define the sampling duration as Ts N R with N R = R2 SF , R being the oversampling rate. The sampled version In the following, we assume that the instant of detection of the LoRa frame is before the beginning of the frame and we arbitrarily set the reference start time (t = 0) between the instant of detection and the start of the received LoRa frame. Thus, the received signal is delayed by τ ≥ 0 compared with t = 0. Moreover, the index of the first sample of the received frame is denoted by n τ , as illustrated in Fig. 2. We assume that the LoRa signal bandwidth is narrow enough to consider a onetap block fading channel h such that the received signal r i [n] can be expressed as in (3) where w is the complex additive white Gaussian noise (AWGN) such that w ∼ CN (0, σ 2 ), θ f is the frequency offset between the transmitter and the receiver, and ψ is an unknown additional phase.
Before tackling the time-frequency synchronization process in LoRa, we first introduce further notations related to the sampled time-frequency domain. To this end, we define the time sampling rate R 0 ≤ R of the signals at the input of the DFT, and we define the ratio η = R R0 . For instance, in the example in Fig. 3, we have R 0 = R 2 . Thus, the fine granularity offered by the oversampling rate R ensures a good time precision to the synchronization over the time domain, whereas R 0 ≤ R allows to reduce the complexity of the synchronization process by reducing the size of the DFT, which is proportional to R 0 . For a matter of clarity, we denote by M the size of the observation window (e.g. the length of the preamble) oversampled with rate R. All other sizes are related to the oversampled rate R by using η. Thus, the DFT sizes are given by M η . Furthermore, we will suppose that M η is evenly-valued, then the index p of sampled frequency domain is such that p ∈ {− M 2η , .., 0, .., M 2η − 1}, and the distance between two frequency samples p and p + 1 corresponds to a frequency spacing of N R M Ts .

B. Synchronization
This section is dedicated to the time and frequency synchronization process, which consists in estimating the best couple (k * , p * ) corresponding to the time-frequency indexes (at oversampling rate R) which are the closest to the values (τ, θ f ). The ML estimation of the sampled delay k and frequency offset p, with respect to unknown parameters h and ψ, can be expressed as with Y k,p the p-th element of the vector defined as where r R0 k and s R0 are the vectors of size M η × 1 containing the samples {r[ηn + k]} and {s[ηn] * }, respectively, with n ∈ |Y k,p | 2 is called the periodogram of the time-shifted received signal multiplied by the conjugate reference preamble. The proof that the ML estimator of (k * , p * ) can be written through the simple expression in (4) is proved in Appendix A.
For further analysis of the ML estimator of (k * , p * ), we define the function g[k] where, for a given k, finding the maximum of |Y k,p | 2 with respect to p is straightforward. It should be noted that the solving of (4) should be a priori carried out only through an exhaustive search over the parameters k and p. This assumption will be verified in Section III thanks to the behavior analysis of g [k], and a solution to reduce the complexity of the exhaustive search will be suggested afterwards.

III. BEHAVIOR ANALYSIS A. Behavior of g[k]
In this section, we assume that τ << T s and R 0 > 1, before generalizing to any τ value afterwards. Note that the case R 0 = 1 corresponding to the Nyquist sampling rate at the input of the DFT is not tackled in this paper as it involves specific processes. Furthermore, we limit the size of the observation window to the K first symbols of the preamble. The reasons behind this choice are twofold: i) the sync word (carrying the network identifier) is generally unknown and cannot then be used for synchronization, and ii) K is usually set to K = 8, allowing for the use of fast Fourier transform (FFT) instead of DFT, therefore simplifying the synchronization process.
To analyze the behavior of g[k], we definer i [n − k] as in (7). Thus,r i [n + k] consists of the n τ − k last samples of r i−1 (corresponding to noise samples if i = 0), and the N R −n τ +k first samples of r i , such as illustrated in Fig. 2. It is shown in Appendix B that |Y k,p | 2 can be expressed as follows in (8), in which γ 1 (k, p) and γ 2 (k, p) are given by: and Despite its apparent complexity, (8) can be easily interpretable, and can be even used to develop a low-complexity synchronization algorithm, such as suggested afterwards. In fact, the two functions asinc are the DFT of sinusoids corresponding to the cases where r i−1 and s i overlap on one hand, and where r i and s i overlap on the other hand (see Fig.  2). Thus, for any k, the function asinc γ1 (A 1 ) is maximum when γ 1 (k, p) is minimum, i.e. when p takes the value p 1 , according to (10): It must be emphasized that the delay and the frequency offset are intertwined, as both have an effect on the value of p 1 . Similarly to (12), we find that for any k, the function asinc γ2 (A 2 ) is maximum when γ 2 (k, p) is minimum, i.e. when p takes the value p 2 , according to (11): It can be noticed that p 1 − p 2 = B w T s K = 2 SF K corresponding to a frequency distance of B w between the two peaks, which is consistent with the theory. Fig. 4 shows the behaviors of the functions A 1 asinc γ1 (A 1 ) and A 2 asinc γ2 (A 2 ) versus p, where we considered the following parameters: Fig. 4, with the aforementioned parameters we find that p 2 = −297 and p 1 = 727 such that p 1 − p 2 = 1024 = 8.2 7 . Furthermore, since k = 0 the amplitudes of the functions are A 1 = Rτ B w /η = 64 and A 2 = 192. Moreover, for any value θ f , one function is maximum for k = n τ (corresponding to the best possible time synchronization) while the other one vanished.
It must be emphasized that, in the range of indexes p close to p 1 and p 2 , the overlap of one function (say A 1 asinc γ1 ) over the other one (A 2 asinc γ2 ) can be neglected. Then we can assume the approximation in (14), which is much simpler than the expression in (8) as the phase terms can be omitted. It remains to analyze the behavior of the last term involving the sum in (14), that we denote by Q for a matter of clarity. For a given value of (τ, θ f ), the function Q only depends on k, since p also depends on k from (12) and (13). Furthermore, it must be noted that |Q| can be bounded as where |Q| = 0 corresponds to the case where Q is the sum of the K k-th roots of the unity, i.e.
Conversely, the case |Q| = K corresponds to the case where , and θ f = 700 Hz. It can be seen that Q oscillates between maxima (close to eight) and minima (close to zero), and the maxima are reached every k = K = 8 samples. Thus, the oscillations are due to the phase components −2jπpi K , while the value of the maximum and minimum of Q depends on 2jπiT s θ f . In Fig. 5- is depicted with the same parameters as in Fig. 5-(a). We observe that g[k] should reach its maximum with respect to k if both and |Q| 2 reach their respective maximum. Furthermore, since |Q| 2 is not concave, then neither g[k] is, hence no simple algorithm can be used to reach the maximum of g [k]. As a consequence, the only solution to solve (4) is a priori the exhaustive search, which is computationally complex.
However, in the following, we show that it is possible to recover the "envelop" of g[k], which is almost concave. The "envelop" of g[k], denoted by g [k], corresponds to It cannot be strictly concave in a general case due to the ceil operator . in A 1 and A 2 , but we assume the shortcut regarding the concavity of the "envelop" of g[k], which is illustrated in Fig.5-(b).

B. Recovering Concavity through g [k]
Although the phase component 2jπiT s θ f in Q is inherent to the system and then not controllable, the phase component . We base our solution on Proposition 1.
Since Proposition 1 holds for any p then it holds for p = p 1 or p = p 2 . The suggested solution to recover the concavity of g then consists in substituting (4) by where q ∈ [[0, .., K − 1]] and    i.e. |Y q k,p | is p-th component of the periodogram of the received signal multiplied by s[ηn] * e −2jπ qηn M . Thus, we redefine g as which exactly corresponds to g in (16), and we show the behavior of g in Fig. 5-(b). We observe that g is indeed the "envelop" of g. We can then use the concavity property of g to suggest a simple algorithm of synchronization in the next section.
Before dealing with the suggested synchronization algorithm, it must be emphasized that the behavior analysis of g and g can be easily extended to any τ value, by avoiding the assumption (τ << T s ) that we made to obtain (35). In particular, it can be noted that if τ = u.T s , u = 0, 1, .., K − 1 and k = 0, then 0 ≤ |Q| ≤ K−u. The same results is obtained when, for instance τ = (u + 1).T s , u = 0, 1, .., K − 1 and k = N R . This highlights the fact that the maximum value of g is proportional to length of the overlapping window between r R0 k and s R0 in (5). It results that g oscillates similarly to the behavior shown in Fig. 5-(b) but between local maxima whose values are (inferiorly) proportional to K − u, and with a periodicity of T s (or equivalently N R samples). The behavior of g can then be generally summarized for any 0 ≤ τ ≤ KT s as follows: g oscillates between local maxima with a periodicity of K samples (see Fig. 5-(a)), and the envelope g of the oscillations itself oscillates between local maximal values with a periodicity of N R samples.

IV. LOW-COMPLEXITY ALGORITHM
In this section, we describe both the exhaustive search and the proposed algorithm in pseudo code format. The aim is to make the processes easy to understand and the results easily reproducible. We use some support functions defined as:

A. Exhaustive synchronization algorithm
We introduce a first algorithm, maxDF T (see Algorithm 1) which is the base operation for exhaustiveSyncro, as well as phase1, phase2 and phase3 used in the suggested method. It returns the maximum index and value of |DF T (r R0 k •(s R0 ) * )| in (5) for a fixed k. The sampling factor ratio η is also provided to reduce the overall computational complexity which is similar to the DFT complexity: O( M η log( M η )).

Algorithm 2: exhaustiveSynchro
input : r R input signal, s R preamble, η sampling ratio output: (k * estimated delay, p * max DFT bin index) The exhaustive algorithm exhaustiveSynchro (see Algorithm 2) is a numerical translation of the exhaustive search in (4). The estimated frequency offset can then be obtained by matching the obtained bin index p * to the corresponding frequency of the DFT coefficients, but it is not further detailed as it is straightforward. The algorithm complexity is O( M 2 η log( M η )) (corresponding to M times maxDF T ) which is computationally too expensive, especially for large SF values. Nevertheless this algorithms is used as a reference in simulation and it is also a good starting point for the understanding of the proposed algorithm.

B. Low complexity synchronization algorithm
Such as aforementioned, the aim of the suggested algorithm is to achieve the same performance as the exhaustive search by drastically reducing the computational complexity. This can be performed by recovering the concave envelop g of g such as in (21). Thus, the proposed algorithm is decomposed in 3 phases depicted in the remaining of this section.

Algorithm 3: phase1
input : r R input signal, P array K × M preamble array, K, η sampling ratio, G granularity output: maximum index of phase1 The first phase phase1 (see Algorithm 3) aims at finding an index k 0 such that k 0 ∈ [[k * − N R 2 , k * + N R 2 ]], therefore avoiding the convergence of the algorithm to a local maximum of g . In this phase we introduce the parameter G that we call granularity in the following. From G we define the subset Ω G Thus, the step phase1 evaluates the "envelop" g using (21) over the subset Ω G , and returns the index k 0 corresponding to the maximum of g [k], k ∈ Ω G . The index k * of (4) must then be located in a range of ± M 2G samples from k 0 . To ensure this, G must satisfies G ≥ 2K. Note that P array is the K × M array containing the K versions of the preamble, i.e. P array [q, n] = s[ηn] * e −2jπ qηn M such as used in (20). The complexity of this step is O(G KM η log( M η )). The function maxDF T K used in Algorithm 3 is hereby described.
Algorithm 4: maxDF T K input : r R input signal, P array K × M preamble array, k delay, K, η sampling ratio output: The function maxDF T K evaluates the envelop g of g (see (21)) for a certain k, such as detailed in Algorithm 4. Note that P array [i] corresponds to the i-th row of P array .

Algorithm 5: phase2
input : r R input signal, s R preamble, k 0 max index of phase1,K, η sampling ratio output: maximum index of phase2 The second phase phase2 (see Algorithm 5) aims at searching the index k 1 of the local maximum of g[k] the closest possible to k 0 . In fact, we know from Q[k] in (14) that g oscillates and reaches local maxima every K samples (with a default LoRa configuration of K = 8 symbols preamble). Therefore, we search the index k 1 of the local maximum of (4) in the range k 0 ± K 2 samples. The complexity of The third phase phase3 (see Algorithm 6) searches the index k 2 of the maximum of g around k 1 . Given that k * is the index that maximize g and Q, it should be located within [[k 1 − uK, k 1 + uK]], u ∈ N, hence we search k 2 in Algorithm 6: phase3 input : r R input signal, s R preamble, G granularity of phase1, k 1 max index of phase2, η sampling ratio output: maximum index of phase3, max DFT bin index . All the DFTs are performed using oversampling R o = 2 to further reduces the complexity which is about O( M 2 KηG log( M η )) for this phase. It should be noted that this phase may be the heaviest in terms of calculation and in practice dominates the execution time. However, the number of calculations is widely reduced compared with the exhaustiveSynchro algorithm and makes the implementation usable in real condition including with SF12 while keeping a similar performance. Otherwise, the choice of G is a tradeoff between the cost of phase1 and phase3, and empirically we find that G = 64 gives the lowest execution times.
Algorithm 7: newSynchro input : r R input signal, P array preamble array, K, η sampling ratio, G granularity output: (estimated delay, max DFT bin) k 0 ← phase1(r R , P array , K, η, G) The overall algorithm is provided in the pseud-code of Algorithm 7. After estimating k using the 3 previous phases, it also returns the position b max of the DFT bin of the max of phase3, then the estimated frequency offset can be calculated thanks toθ f = b max . N R M Ts , where we remind that N R M Ts is the frequency spacing between two bins.

V. SIMULATIONS AND DISCUSSION
This section presents the simulations results, the complexity analysis, and a discussion of the suggested method and results. Simulations have been performed using Python 3 including numpy library, and the source code can be accessed at https: //github.com/b-com/synchro_LoRa. All results of Monte-Carlo simulations have been averaged over 10 4 runs. We consider   LoRa signals with SF7 and SF9, and with delays (in sampled version) and frequency offsets randomly chosen within the sets n τ ∈ [[5, 5 + N R ]] and θ f ∈ [−10, 10] kHz, respectively. Moreover, we arbitrarily set K = 8, R = 8, R 0 = 2, and G = 64, as we empirically found that this provides a good trade-off between performance and complexity.

A. Simulations Results
Fig . 6 shows the standard deviation of estimation of the frequency offset θ f (Fig. 6-(a)) and the sampled delay k (Fig.  6-(b)) versus SNR in the range [−15, 5] dB. The standard deviation (SD) of the estimateX of X, denoted by SD, is defined as where N run corresponds to the number of independent simulations runs. It can be observed in Fig. 6-(a) that the behavior of SD(θ f ) of the suggested algorithm matches that of the exhaustive search, therefore validating the proposed implementation. Thus, we can notice that SD(θ f ) decreases from SN R =-15 to 0 dB, and reaches a lower bound for higher SNR values. This non-zero lower bound is due to the precision limitation inherent to the DFT, since the frequency spacing between two DFT bins is equal to N R M Ts . As a consequence, the absolute value of the error of estimation |θ f − θ f | falls within the interval [0, N R 2M Ts ]. Moreover, it must be noted that SD(θ f ) for SF9 is twice weaker than the SD for SF, e.g. 2.8 Hz for SF9 against 5.6 Hz for SF7 at SNR=5 dB. This is consistent with the theory, as the frequency spacing between two bins is four times smaller in SF9 compared with SF7. Note that it has been proposed in [12], [14] to estimate the fractional frequency offset. This method could then be used in addition to the suggested low-complexity algorithm to further improve the estimator, but this is out of the scope of this paper.
We can observe in Fig. 6-(b) that the suggested technique and the exhaustive search have a similar behavior regarding the estimation of the delay. However, it can be noticed that the SD of SF9 coincides with that of SF7 in the SNR range lower than −14 dB, whereas SF9 outperforms SF7 for higher SNR values. This can be explained by the fact that the delay is chosen in an interval that is proportional to the size of the symbols, i.e. proportional to 2 SF . Thus, in very low SNR values, the noise preponderates over the signal, and it is more likely to erroneously estimate k * = n τ if n τ ∈ [[5, 5 + 2 9 ]] (in SF9) rather than n τ ∈ [[5, 5 + 2 7 ]] (in SF7), even though the energy of the signal in SF9 compensates this phenomenon. In higher SNR range, the SD of SF9 is smaller than that of SF7, and reaches a lower bound of about tenth a sample, inherently due to the residual error on the frequency offset, since frequency offset and delay are intertwined in (12) and (13).
While Fig. 6 shows that the suggested algorithm reaches the performance of the exhaustive search, it remains to investigate its gain in terms of complexity. According to the asymptotic complexity of the algorithms assessed in Section IV, and the aforementioned chosen parameters, we find that the complexity of the exhaustive search and of the proposed method are O(2 2SF +10 log(2 SF +4 )) and O(2 2SF +1 log(2 SF +4 )), respectively. Thus, our algorithm is less complex by a factor 512 compared to the exhaustive search, in terms of number of operations. To provide more explicit results regarding the complexity, Table V-A gives the execution time of the suggested algorithm and the SD averaged over 100 runs. The results are obtained on a desktop with an Intel Xeon E5-1620 v4 @ 3.50GHz processor with 16GB memory, and optimized using a C++ implementation. In the same context, the execution time of the exhaustive algorithm in SF7 is about 280 ms, namely about 100 times longer than the proposed method, and this times quadratically grows as a function of the SF. Thus, we find an average computation time of 25.7 × 10 3 ms in SF10. Therefore these results give the advantage to our solution compared with the exhaustive search. In particular, it can be noticed from Table V-A that the duration of the synchronization is in the order of 0.5 s in SF12, whereas the typical frame duration is at least 1 s. The proposed method therefore allows the LoRa receiver to perform the synchronization and demodulation in real time for any SF.

B. Discussion
This paper deals with both theoretical and implementation aspects of the synchronization in LoRa. To go further, it is noteworthy that a real-time LoRa receiver including the suggested synchronization algorithm has been implemented, such as described in [17], [18]. Different alternative solutions to the proposed one could be investigated to take advantage of the concavity of g and further improve the complexity of the synchronization method, such as a gradient descent. Moreover, such as aforementioned, the suggested algorithm could be combined with fractional frequency offset estimators such as described in [12], [14], [15], in particular by using the two symbols of the sync word and the downchirps. Alternatively, the fractional frequency offset could be estimated by maximizing the function Q according to θ f , which does not require the sync word. Another lead for further researches consists in adapting the analysis and the algorithm to LoRa signal sampled at Nyquist rate, namely R = R 0 = 1. In fact, in this case, it must be noted that the results (12) and (13) do not hold anymore as p 1 and p 2 are merged due to the aliasing. Thus, this paper paves the way for further works on synchronization in the LoRa system.

VI. CONCLUSION
In this paper, we have investigated the challenging problem of the LoRa chirp waveform receiver time and frequency synchronization. Based on a thorough study of the oversampled received LoRa signal model including time delay and frequency offset, we have derived the ML estimator for these two parameters. We have shown that the time and frequency synchronization problem can be translated into an optimization problem, where however the cost function is not concave. Our proposed approach, from a detailed analysis of the cost function, relies on a quick search of a near optimal estimate of the time delay, avoiding local optimums, followed by a search for the optimum in the vicinity of the first estimate. This proposal greatly reduces the complexity that would be required with the exhaustive search. Our results have shown that the ML estimation performance is reached, with a reduced calculation time, which allows for a real time software implementation of the synchronization on commodity hardware.

APPENDIX A PROOF OF (4)
Basics on ML estimation can be found in [19]. We here apply the general principles in [19] to the LoRa signal (3). The likelihood function of a complex Gaussian variable X ∼ CN (0, σ 2 ) is given by f X (x) = C exp(− |x| 2 σ 2 ), where C is a normalization constant that does not need to be detailed. We define Π M as an observation window of size M such as M ≥ M +n τ , i.e. we ensure that Π M of size M contains the whole expected window of size M containing the preamble. Moreover, we suppose that the noise samples w[n] within Π M are independent and identically distributed. Then, the maximization of the likelihood function corresponding to the observation r[n] in (3) leads to the ML estimator that can be expressed and simplified as in (23)  in the first line of (23) corresponds to the sampled version of the contribution of the frequency offset. It must be highlighted that the terms ηn indicates that an oversampling R 0 is used to perform the estimator, whereas the term k indicates that the sliding window is shifted according to the oversampling rate R (finest time granularity).