Fast-Convergence Digital Signal Processing for Coherent PON Using Digital SCM

It is foreseeable that the 100 Gb/s/$\lambda$ and beyond passive optical network (PON) will be required in future optical access networks to meet the explosive growth of data traffic. The coherent optical systems could be a promising solution for the future beyond 100 G PON. Coherent PON using digital subcarrier multiplexing (DSCM) can provide flexible bandwidth allocation to a large number of access subscribers by dividing subcarriers of the DSCM signal into time slots for time-and-frequency division multiple access. When the optical network unit is allocated a new subcarrier, digital signal processing (DSP) should converge fast in the allocated time slot to ensure a low handoff latency for real-time bandwidth allocation. However, the traditional coherent DSP is hard to realize fast convergence due to blind and complex algorithms. In this paper, we design a specific training sequence (TS) structure and propose data-aided DSP to achieve fast convergence for coherent PON. The feasibility of the proposed scheme is experimentally verified in an 8 Gbaud/SC×8 SCs 400 Gb/s-net-rate coherent PON using DSCM with 16 quadrature amplitude modulation. The experimental results show that fast convergence is jointly realized by the proposed TS structure and data-aided DSP using a 416-symbol TS with a 52 ns duration. The receiver sensitivity at the 20% soft-decision forward error correction limit is approximately $\mathbf{-27}$ dBm and an optical power budget of about 35.5 dB is achieved with a booster amplifier.


I. INTRODUCTION
O PTICAL access networks have been evolving to accommodate the explosive traffic demands [1], [2], [3]. It can be expected that the 100 Gb/s/λ and beyond passive optical network (PON) will be required to support the ever-increasing traffic demands [4], [5], [6]. Intensity-modulation and direct-detection (IM-DD) optical systems are hard to meet the requirement of optical power budget for the future beyond 100 G PON [7], [8], [9]. The beyond 100 G PON can use coherent optical systems to achieve a higher optical power budget and larger network capacity [10], [11], [12]. Meanwhile, many simplified coherent optical systems have been proposed to overcome the cost obstacle of the practical applications on the cost-sensitive PON [13], [14], [15].
Owing to the advance of laser stability, digital subcarrier multiplexing (DSCM) has attracted much attention for frequencydivision multiple access (FDMA) in coherent PON. The coherent PON using DSCM owns the advantages of FDMA, including low latency and flexible bandwidth allocation [16], [17]. However, due to the larger accumulated phase noise caused by the laser, the phase noise on the lower baud-rate subcarrier signal is harder to be eliminated. Thus, the subcarrier number of DSCM cannot be increased without restriction, which limits the number of access subscribers [18], [19]. Time-division multiple access (TDMA) PON can support a large number of access subscribers by allocating time slots, which has been widely applied in fiber-to-the-home (FTTH) with no less than 64 access subscribers [20], [21], [22]. TDMA PON is a kind of statistical multiplexing with a high overall capacity. However, TDMA PON has high latency and requires a high-bandwidth receiver.
For taking full advantage of FDMA and TDMA, time-andfrequency division multiple access (TFDMA) has been proposed, in which each subcarrier of an N -subcarrier DSCM signal can be divided into M time slots to provide flexible bandwidth allocation for up to N × M access subscribers [23]. Fig. 1 shows the schematic diagram of bandwidth allocation in the TFDMA based on DSCM. Subcarriers can be individually allocated to applications with strict requirements of low latency, such as allocating the first subcarrier to 5G/6G services. Applications with less strict requirements of low latency can get access by TDMA. To provide low-latency access and improve bandwidth efficiency, idle subcarriers can be reallocated and utilized. For example, the third time slot of the third subcarrier is allocated to other applications, while the second subcarrier can be allocated to FTTH instead of waiting for the own time slots on the third subcarrier. Therefore, digital signal processing (DSP) should converge fast in the allocated time slot to ensure real-time bandwidth allocation when the optical network unit (ONU) is allocated a new subcarrier.
In the ITU-T standard of 50G-PON, the convergence time of IM-DD DSP has been defined on the timescale of 100 ns [24]. Due to blind and complex algorithms, the traditional coherent DSP is hard to achieve convergence on the timescale of 100 ns. Many efforts have contributed to the fast-convergence DSP for coherent TDMA PON. In [25], a DSP using an 816 ns (20400 symbols/25 Gbaud) training sequence (TS) was proposed for the burst-mode detection of 100 Gb/s coherent PON. The pre-calculated filter coefficients and a TS of 1.3 μs (13000 symbols/10 Gbaud) were used for burst-mode coherent detection in [26]. A 100 Gb/s real-time burst-mode coherent receiver using data-aided algorithms and a 120 ns (3000 symbols/25 Gbaud) TS was demonstrated in Refs. [27], [28]. A 59.43 ns (1664 symbols/28 Gbaud) TS combined with a memory-aided equalization strategy for the burst-mode coherent detection was proposed in [29]. In [30], a 71.68 ns (1792 symbols/25 Gbaud) TS and DSP were proposed for the burst-mode detection of coherent PON. As the references show, the convergence time is equal to the length of TS divided by the baud rate. Therefore, it is still a great challenge to meet the 100 ns convergence time for the coherent TFDMA PON with low baud-rate subcarriers.
In this paper, we design a specific TS structure and propose data-aided DSP to jointly implement the fast convergence for r The proposed specific TS structure is designed for key algorithms of the data-aided DSP, covering the frequency offset estimation (FOE), timing recovery (TR) with phase initialization, frame synchronization, and equalizer training with pilot-based carrier phase recovery (CPR).
r Fast convergence of an 8 Gbaud/SC×8 SCs 400 Gb/s-netrate coherent PON is jointly realized by the TS structure and data-aided DSP using a 416-symbol TS with a 52 ns duration (416 symbols/8 Gbaud), which meets the convergence time requirement. An optical power budget of about 35.5 dB is achieved by using a booster amplifier. The rest of the paper is organized as follows. In Section II, the specific TS structure design and data-aided DSP principle for the coherent PON are given. In Section III, the experimental setups of the coherent PON using DSCM are presented. The experimental results and discussions are demonstrated in Section IV. Finally, the paper is concluded in Section V.

II. TS STRUCTURE AND DATA-AIDED DSP
In this section, the proposed overall TS structure design and the principle of the data-aided DSP are introduced, which can jointly achieve fast convergence for the coherent PON.

A. Overall Data Frame Structure
The data frame structure of the coherent PON with fast convergence is shown in Fig. 2(a). There are four TSs designed for different functions. TS-A with 64 periodic quadrature phase shift keying (QPSK) symbols is designed for frame detection, coarse FOE, and initial sampling phase estimation. The structures of TS-B and TS-C are shown in Fig. 2(b). TS-B for frame synchronization and fine FOE is composed by sequences S 1 , S 2 periodically for X polarization and S 3 , S 4 for Y polarization.
The lengths of S 1 , S 2 , S 3 and S 4 are ten, while that of the cyclic prefix (CP) is two. TS-C is utilized to estimate the state of polarization (SOP) for the initialization of equalizer tap coefficients. It contains 16 QPSK sequence S C1 followed by 16 zeros for X polarization and 16 zeros followed by 16 QPSK sequence S C2 for Y polarization. TS-D is made up of 256 QPSK training symbols for the equalizer training. Therefore, the total length of TS is 416 symbols. Finally, one pilot symbol is inserted into every 32 payload symbols for the pilot-based CPR, which accounts for 3.125% overhead.

B. TS-A for Frame Detection and Coarse FOE
It is crucial to compensate for coarse frequency offset before the match filtering. The frame detection is required before the coarse FOE, which uses the TS-A. TS-A consists of a periodic sequence, leading to symmetric tones with respect to the zero frequency in the frequency domain. The received signal is first down-sampled by a factor of m and transferred to the frequency domain using N p -point fast Fourier transform (FFT). Then the frequency components with a power less than the average power of the non-zero components are filtered out. This operation is performed three times, which can filter out the noise and find the frequency tones of TS-A. It serves as an automatic threshold setting for signal detection and can be applied to signals with different received optical power (ROP).
After the frequency tones of the TS-A are detected, the coarse FOE is implemented as [31], [32] where f 1 and f 2 are the frequencies of the symmetric tones. The accuracy of the estimated coarse frequency offset is where F s is the sampling rate.

C. TS-A for Timing Recovery With Phase Initialization
An appropriate sampling phase initialization plays a significant role in the convergence of TR. Fortunately, the sampling phase offset can be initialized by using the TS-A. The received TS-A with a sampling phase offset τ in the frequency domain can be modeled as [33] where A(k) is the frequency-domain signal and H(k) is the frequency-domain channel response. Godard timing error detector is a frequency-domain timing error detector [34], [35], which estimates the timing error as where (·) * denotes the conjugation. However, TS-A is not enough for accurate TR, which requires some samples from the subsequent TS to converge.

D. TS-B for Frame Synchronization
After the TR with phase initialization, frame synchronization is implemented based on the TS-B. The sequences S 1 and S 3 in TS-B are consist of L QPSK pilot symbols. The S 2 and S 4 are defined as where i = 1, 2, . . . , L. pn is a pseudo-random noise (PN) sequence with the length of L [36], and CP is added to increase the tolerance for synchronization errors. Frame synchronization is realized by using a sliding window and the cross-correlation on each polarization, which is expressed as [37] where r denotes the received signal. The timing synchronization is based on the timing metric, which is calculated as The half-symbol energy R(d) is calculated as There are five sharp peaks in the M (d) for the reason that TS-B contains three repeated sequences [S 1 , S 2 ] on the X polarization and [S 3 , S 4 ] on the Y polarization. M (d) with 0, L, 2 L, 3 L and 4 L delay are stacked over to enhance the tolerance for noise [38], which can be expressed as The position of the highest peak at the stacked timing metric M (d) is the frame synchronization position.

E. TS-B for Fine FOE
Let the successive sequences [S 1 , S 2 ] and [S 3 , S 4 ] in TS-B be denoted as S B1 , S B2 and S B3 , as shown in Fig. 2(b). The received signal of S B1 , S B2 and S B3 denote r B1 , r B2 and r B3 , respectively. After frame synchronization, the exact locations of S B1 , S B2 and S B3 are available. When only the frequency offset is considered, the received signal of the TS S Bi can be modeled as where i = 1, 2 and 3 and Δf Fine is the fine frequency offset. The TS-based fine FOE is also achieved by using the TS-B, which estimates the frequency offset as [39] Δf where arg(·) represents the operation of taking the angle of a complex value and (·) H denotes the conjugate transpose operation. Since the angle of a complex value is limited to [−π, π), the estimated fine frequency offset is limited to Therefore, as the length of TS-B increases, the peak of the timing metric in frame synchronization becomes prominent, but the range of the estimated fine frequency offset decreases.

F. TS-C&D for Equalizer Training and Pilot-Based CPR
In order to realize the fast convergence of equalization, the SOP estimation, training-based MIMO equalizer with pilotbased CPR coordinate together. TS-C includes a sequence S C1 followed by I zeros on the X polarization, while I zeros are followed by a sequence S C2 on the Y polarization. S C1 and S C2 contain I QPSK symbols. Firstly, TS-C is used to estimate the inverse Jones matrix as [40], [41] H inv = cos α −e −jφ sin α e jφ sin α cos α (13) where α and φ can be calculated as The [x C1 , x C2 ] and [y C1 , y C2 ] are the received signal of TS-C on the X and Y polarization, respectively. In addition, it can be derived that The sign of α is the same with that of tan α, which can be calculated as where real(·) is the real component. The elements of the estimated inverse Jones matrix are assigned to the center taps of the multiple input multiple output (MIMO) equalizer as the initial coefficients. Secondly, the TS-D is designed for equalizer training based on the least mean square (LMS) algorithm. After the tap coefficients of the MIMO equalizer are converged, it switches to decision-directed-LMS (DD-LMS) to track the tap coefficients [42]. However, it is required to calculate the frequency and phase offsets before calculating the error function of the DD-LMS algorithm. Thus, a fast CPR algorithm is important for the equalizer to track the tap coefficients, while the frequency offset has been already compensated.
Thirdly, a pilot-based CPR algorithm and the maximum likelihood (ML) algorithm are used for fast CPR in the proposed DSP architecture. One pilot symbol is inserted into every K payload symbols for the pilot-based CPR, which estimates the phase rotation of the pilot symbol as where p is the output of the MIMO equalizer and t is the pilot symbol. Then the phase noise of the K − 1 symbols between the j-th and the (j + 1)-th pilot symbols is initialized as ψ Pilot (j). Then the following maximum likelihood algorithm is used to estimate the residual phase noise as where q(l) denotes the equalized signal p(l) after the ψ Pilot (l) phase rotation andq(l) is its decision. Q is the half-length of the average filter. The phases ψ Pilot and ψ ML are fed back to the DD-LMS algorithm.

III. EXPERIMENTAL SETUPS
To verify the feasibility of the proposed scheme, an 8 Gbaud/SC×8 SCs 400 Gb/s-net-rate PDM-16QAM coherent PON using DSCM is experimentally built up. Fig. 3(a) shows the experimental setups of the polarization-division-multiplexed (PDM) coherent PON at the transmitter (Tx) side. First, bits are mapped to 16 quadrature amplitude modulation (16QAM) symbols, and the designed 416-symbol TSs are added. There are eight subcarriers and 9137 symbols at each subcarrier. Then the signal power at each subcarrier is optimized to balance their signal-to-noise ratio to achieve similar performances. After the pulse shaping using a square root-raised cosine (RRC) filter with a 0.1 roll-off factor and frequency shift, all subcarriers are multiplexed to generate the DSCM signal. There is no guard interval between the two subcarriers. Finally, the DSCM frames are re-sampled to match the sampling rate of the arbitrary waveform generator (AWG).
After the generation of the DSCM frame using the Tx DSP, it was converted to an analog signal by an AWG (Keysight M8194 A) operating at 96 GSa/s. After being amplified by electrical amplifiers (EAs), the electrical signal was modulated on an optical carrier by a dual-polarization in-phase/quadrature modulator (DP I/Q Mod.) to generate the optical signal. An external cavity laser (ECL) with a linewidth less than 100 kHz was used as the laser source at the transmitter. The wavelength of the ECL is approximately 1545.123 nm. The output power of the modulator is approximately −14 dBm. An erbium-doped optical fiber amplifier (EDFA) was adopted to increase the optical power budget. Finally, the signal was launched into the 20 km standard single-mode fiber (SSMF). Fig. 3(b) shows the experimental setups of the coherent PON at the receiver (Rx) side. A variable optical attenuator (VOA) was used to adjust the ROP. A tunable ECL with a linewidth less than 100 kHz was used as the local oscillator (LO). The LO output power is about 12.31 dBm. The optical signal was mixed with the LO in an integrated coherent receiver (ICR) and converted to an analog signal. The subcarriers can be selected by LO wavelength tuning without using optical filters [43]. Then the analog signal was digitized by a real-time oscilloscope (RTO, Keysight UXR0594AP) with a 256 GSa/s sampling rate. Finally, the signal was recovered by the proposed fast-convergence DSP. The distributed feedback (DFB) laser with limited tunability is a cost-effective configuration of LO for the actual deployment of coherent PON using DSCM with a narrow spacing of subcarriers [44], [45]. The tuning time of the DFB laser can be below the microsecond timescale [46]. Until the LO tuning is completed, data can be buffered in the optical line terminal (OLT) if a new subcarrier is allocated [47]. After LO tuning, DSP should realize fast convergence in the allocated time slot for real-time bandwidth allocation.
The received signal is first down-sampled by a factor of 16 to 2 samples per symbol (sps). Then the frequency tones are searched by using 256-point FFT, which can contain most of the TS-A samples. The coarse FOE is performed once the symmetric frequency tones are detected. The signal at the allocated subcarrier is filtered by the RRC filter. After the phase initialized-TR, frame synchronization and fine FOE are realized based on the TS-B. After the channel equalization using the training-based MIMO equalizer, the carrier phase is recovered. Finally, the recovered 16QAM symbols are de-mapped to the bits. Fast convergence of the coherent PON is realized by the proposed scheme using the 416-symbol TS with a 52 ns duration (416 symbols/8 Gbaud). Even if the DFB laser with a larger linewidth is used, it will not have much influence on the convergence. The total line rate of the 8 Gbaud/SC×8 SCs coherent PON using DSCM with PDM-16QAM is 512 Gb/s, i.e., 4 bits/symbol/pol.

IV. EXPERIMENTAL RESULTS AND DISCUSSIONS
The spectrum of the 8 Gbaud/SC×8 SCs DSCM signal at the transmitter is shown in Fig. 4(a). It contains eight subcarriers, and each subcarrier carries an 8 Gbaud signal. The DSCM signal is transmitted from the OLT to the eight ONUs through a point-to-eight-point PON architecture as shown in Fig. 4(b). Each ONU selects its allocated subcarrier by tuning the LO wavelength. Fig. 4(c)-(j) show the spectrum of the received signal at each ONU in a dashed line and the allocated subcarrier in a solid line. The allocated subcarrier of each ONU can be detected as the baseband signal, while the frequency offset could cause a penalty. Therefore, the first step to signal recovery with fast convergence is to compensate for the frequency offset before the matched filtering.
The received signal spectrum of the first subcarrier at the positive half frequency is shown in Fig. 5. The difference between the frequency of the LO and the laser source should be  near the center frequency of the target subcarrier so that the target subcarrier will be detected as a baseband signal. There is a frequency tone in the spectrum, of which the frequency is equal to the difference between the laser source frequency f source and the LO frequency f LO . Therefore, the sum of the central frequency of the subcarrier and the tone frequency can be regarded as the actual frequency offset.
The estimated frequency offset versus actual frequency offset and the absolute error between them are shown in Fig. 6. For the DSCM systems, frequency offset should be controlled below half of the R s to ensure that a specific subcarrier of interest can be selected. Consequently, the tested frequency offset ranges from −3.5 GHz to 3.5 GHz. However, the actual frequency offset cannot be controlled precisely due to the frequency drifts both of the laser source and LO. The total estimated frequency offset is very close to the actual frequency offset, which is estimated by the proposed coarse and fine FOE algorithms. The absolute error between the estimated and actual frequency offset is within ±2 MHz. Therefore, the proposed coarse and fine FOE algorithms are effective for the coherent PON using DSCM, which requires TS-A and TS-B with a total length of 128. According to (2), the minimum estimation error of the  coarse FOE is 62.5 MHz when the signal is down-sampled to 16 GSa/s and the FFT size is 256. Fig. 7 shows the timing error of the Godard TED without and with the sampling phase initialization. Regardless of whether the phase is initialized or not, the timing error would eventually converge to approximately T /2. Nevertheless, the timing error converges faster when the sampling phase is initialized compared to no phase initialization. The timing error of TR converges after hundreds of samples when the phase initialization is used. However, it takes approximately 2500 samples to converge if the phase initialization is not used. The bit-error ratio (BER) of the first 5000 symbols are 1.54e-2 and 1.37e-2 when the sampling phase is without and with initialization, respectively. As a result, the phase initialization using TS-A can accelerate the convergence of TR and the whole TS is sufficient to achieve accurate TR for the payload.
The timing metric M (d) is shown in Fig. 8, when the lengths of S 1 and TS-B are 10 and 64, respectively. There are five sharp peaks in the timing metric M (d). However, because the length of S 1 for cross-correlation is too small, the peaks are not very prominent. It is susceptible to noise and may lead to synchronization errors. There are nine sharp peaks in the stacked  timing metric as shown in Fig. 9. After delaying and stacking over the timing metric M (d), the highest peak can be clearly distinguished from the noise. In addition, there is no plateau at the peak of the stacked timing metric, as shown in the Inset (i). Thus, the frame synchronization position can be obtained very precisely.
The peak-to-maximum-noise ratio (PMNR) can be used to measure the quality of timing metric versus the noise [30]. The length of TS-B is related to S 1 and S 2 . Fig. 10 shows the average PMNR of the X and Y polarization versus the length L of S 1 without and with stacking the timing metric M (d). PMNR of the stacked timing metric is at least 3 dB higher than that without stacking. As the length L increases, PMNR increases. It indicates that the frame synchronization becomes more robust. However, as the length of S 1 increases, the estimation range of the fine FOE decreases. PMNR is approximately 10 dB when L is 10. According to (12), the estimated range of fine FOE is within ±200 MHz if L is 10, which can cover the minimum estimation error of the coarse FOE. Therefore, sequences S 1 and S 2 with a length of ten are used to make a trade-off between the robustness of frame synchronization and the estimation range of fine FOE, which contribute to the TS-B with the length of 64 symbols.  To determine the proper length of TS, the TS-D is set long enough for the training of the eleven-tap MIMO equalizer. The mean square error (MSE) between the equalizer outputs and the training symbols using the SOP estimation and LMS algorithm or CMA is shown in Fig. 11. BER using CMA and radius-directed equalization (RDE) or LMS and DD-LMS versus training length is also shown. If the overall trend of the MSE curve or BER does not decline, it can be defined as convergence. Compared to the blind-mode CMA, the LMS algorithm makes the MSE drop more rapidly and converge to about 0.2. If the noise after equalization can approximate additive Gaussian noise, MSE can reflect BER to some extent. When the training length is about 256, both MSE and BER of the proposed DSP do not decrease significantly, which is implemented in serial. Thus, the length of TS-D is set to 256. Fig. 12 shows The estimated phase noise by using only the pilot-based CPR, the pilot-based CPR and the maximum likelihood algorithm, respectively. The estimated phase noise ψ Pilot using only the pilot-based CPR for the 31 symbols between two pilots is a constant, which shows a step-like shape. However, the actual phase noise of the symbols between the two pilots is not the same, which is well modeled by a Wiener process. Then  the residual phase of each symbol after the pilot-based CPR is estimated by the maximum likelihood algorithm. Therefore, the total phase noise estimated by the pilot-based CPR and maximum likelihood algorithm is close to that by only the pilot-based CPR, but it is different for each symbol and more accurate.
BER performance of the signal at each subcarrier versus ROP over optical back-to-back (OBtB) and 20 km SSMF transmission are shown in Figs. 13 and 14, respectively. The penalty caused by the chromatic dispersion is negligible for the 8 Gbaud/SC×8 SCs DSCM system [48]. As a result, although no additional chromatic dispersion compensation is adopted except for the MIMO equalizer, the BER performance of the signal transmitted over the 20 km SSMF is similar to that over OBtB transmission. By using a short TS with a length of 416 symbols and the pilot symbols accounting for 3.125% overhead, the proposed specific TS structure and data-aided DSP realize fast convergence for the coherent PON. BERs of all the subcarriers can achieve below the 20% SD-FEC limit at the ROP of approximately −27 dBm. Fig. 15 shows the optical power budget and the required ROP to reach the 20% SD-FEC limit versus launch optical power (LOP) for the coherent PON using DSCM. The LOP was controlled by the gain of the EDFA. With the increase of LOP, the optical power budget increases at first and then stays basically unchanged. Although the LOP is increased, the saturation of effective gain leads to the decline of the optical signal-to-noise ratio. To reach the 20% SD-FEC limit, a higher ROP is required. Therefore, the optical power budget keeps basically unchanged when the output power reaches 10 dBm and above. Therefore, the optimal LOP of the system is 10 dBm and the required ROP is −25.5 dBm. The maximum power budget of the coherent PON using DSCM is approximately 35.5 dB. Therefore, the proposed coherent PON using DSCM for TFDMA can support the FTTH with a low-splitting ratio by dividing subcarriers of the DSCM signal into time slots.

V. CONCLUSION
In this paper, we design a specific TS structure and propose data-aided DSP to jointly achieve fast convergence for the coherent PON. The proposed specific TS structure is designed for the key algorithms of the data-aided DSP, including frame detection, FOE, TR with phase initialization, frame synchronization, and training-based channel estimation. The experimental results of an 8 Gbaud/SC×8 SCs 400 Gb/s-net-rate coherent PON show that fast convergence is realized by the proposed specific TS structure and data-aided DSP using a 416-symbol TS with a 52 ns duration. The convergence time will be affected by the update frequency of the LMS algorithm if the proposed DSP runs in parallel on the application-specific integrated circuits. The receiver sensitivity is −27 dBm at the 20% SD-FEC limit. The optical power budget is 35.5 dB when a booster amplifier is used. The proposed scheme is expected to meet the burst-mode detection requirements of coherent PON in upstream transmission. It would also be the basis of the fast-convergence DSP for simplified coherent optical systems. In conclusion, the proposed specific TS structure and data-aided DSP show great potential to achieve fast convergence in the future beyond 100 G PON.