Protection against Failure of Machine Learning-based QoT Prediction

—Machine learning (ML)-based methods are widely explored to predict the quality of transmission (QoT) of a lightpath, which is expected to reduce optical signal to noise ratio (OSNR) margin reserved for the lightpath and therefore improve the spectrum efficiency of an optical network. However, many studies conducting this prediction are often based on synthetic datasets or datasets obtained from laboratory. As such, these datasets may not be amply representative to cover the entire status space of a real optical network, which is often exposed in harsh environment. There are risks of failure when using these ML-based QoT prediction models. It is necessary to develop a mechanism that can guarantee the reliability of a lightpath service even if the prediction models fail. For this, we propose to take advantage of the conventional network protection techniques that are popularly implemented in an optical network and reuse their protection resources to also protect against such a type of failure. Based on the two representative protection techniques, i.e., 1+1 dedicated path protection and shared backup path protection (SBPP), the performance of the proposed protection mechanism is evaluated by reserving different margins for the working and protection lightpaths. For 1+1 path protection, we find that the proposed mechanism can achieve a zero design-margin (D-margin) for a working lightpath thereby significantly improving network spectrum efficiency, while not scarifying the availability of lightpath services. For SBPP, we find that an optimal D-margin should be identified to balance the spectrum efficiency and service availability, and although not significant, the proposed mechanism can save an up to 0.5-dB D-margin for a working lightpath, while guaranteeing the service availability.


INTRODUCTION
ccurate prediction of quality of transmission (QoT) of a lightpath is essential for optical network design. This can reduce QoT margin required for a lightpath and allocate A part of this paper was presented at OFC 2020 [1]. Ningning Guo network resources more efficiently [2]. Different approaches have been proposed to estimate the QoT of a lightpath, among which two most representative are the exact nonlinear Schrödinger equation solver based on the split-step Fourier method [3] and the approximate analytical model based on the Gaussian noise (GN) model [4]. The former is more accurate but has a much higher computational complexity, thereby failing to provide a real-time QoT prediction. In contrast, the latter is simpler but less accurate, and therefore required to reserve more QoT margin to meet a predefined QoT threshold requirement [5]. To improve the accuracy of the QoT prediction and perform a fast prediction, machine learning (ML)-based approaches have been proposed and investigated recently [6][7]. They are verified to be accurate and can reduce the reserved QoT margins for some network scenarios.
However, we note that many ML-based prediction models were constructed based on laboratory datasets, which are not general enough to representatively cover the status of various real networks. This is because in a real optical network, a lightpath may traverse different fiber spans, some are laid underground, some are hung on telephone poles, some are in extremely cold areas, and some are in hot areas. Moreover, fiber spans hung on telephone poles may be blown by strong winds, etc. Therefore, when these models are employed to predict the QoT of a lightpath in a real optical network, they may fail, and if the lightpath is provisioned based on these models, its availability cannot be guaranteed. As such, a mechanism is required to protect against the failure of the lightpath due to the inaccurate QoT prediction. For this, we specifically propose to take advantage of the conventional network protection techniques that are popularly implemented in an optical network, i.e., 1+1 dedicated path protection and shared backup path protection (SBPP), and reuse their protection resources to also protect against the failure of ML-based QoT prediction. Specifically, in the context of a pair of working and protection lightpaths, we employ the ML-based method to reserve a QoT margin for the working lightpath for better spectrum efficiency, and the traditional method to reserve a QoT margin for the protection lightpath such that the lightpath service can always be recovered when the working lightpath fails due to the failure of ML-based QoT prediction. Based on this benchmark scheme, we also consider other combinations: (1) both working and protection lightpaths employ the ML-based QoT prediction method, and (2)  A margin reservation method. The performance of all these schemes is evaluated and analyzed in terms of spectrum efficiency and lightpath service availability in the context of the routing and spectrum (RSA) assignment problem in an elastic optical network (EON). The main contributions of this study are as follows. First, we propose a protection mechanism to tackle the failure of ML-based QoT prediction, which can take advantage of the ML-based QoT prediction method to reduce the reserved margins while still guaranteeing the availability of provisioned lightpath services. To the best of our knowledge, this is the first effort in the literature to carry out such a kind of protection.
Second, we develop a new model for estimating lightpath service availability, in which the failure probability of a lightpath service is calculated based on its reserved optical signal to noise ratio (OSNR) margin. Under different network protection techniques, we further identify the conditions of successfully establishing a lightpath service that can be immune to the QoT failure.
Finally, we evaluate the performance of the proposed protection mechanism in terms of spectrum efficiency and lightpath service availability. We estimate how the ML-based QoT prediction can help improve network spectrum efficiency while guaranteeing required lightpath service availability.
This paper is an extended version of a conference paper [1]. The following key aspects have been enhanced in the current paper. First, in addition to 1+1 path protection, we consider more efficient SBPP protection technique to implement the QoT failure protection. Second, we develop an analytical model to estimate the lightpath service availability when different OSNR margins are reserved for working and protection lightpaths, and the condition of successfully provisioning a reliable lightpath service is identified. Finally, more simulation studies are carried out and more results are reported and analyzed.
The rest of this paper is organized as follows. Section II introduces related work on ML-based QoT prediction methods and network protection techniques. Section III describes two OSNR-margin-reservation methods, including the traditional reservation method and the ML-based QoT prediction method. Section IV uses examples to illustrate the protection mechanism for the QoT prediction failure. Section V elaborates on the analytical models for estimating lightpath service availability in the context of the proposed protection mechanism. We present and analyze the simulation results in Section VI and conclude the paper in Section VII.

A. ML-based QoT Prediction
In recent years, considerable studies have been conducted to employ ML techniques for lightpath QoT prediction. Rottondi et al. [7] investigated a machine-learning classifier to predict whether the QoT of a lightpath can meet a predefined threshold, which was verified to achieve a higher prediction accuracy. Morais and Pedro [8] compared different ML models for lightpath QoT prediction, including K-nearest neighbors (KNN), logistic regression, support vector machines (SVM), and artificial neural networks (ANN), and found that ANN performs best with an accuracy up to 99%. Gao et al. [9] developed an ANN-based multi-channel QoT predictor to evaluate lightpath Q-factors, which can perform accurately with a maximum error less than 0.06 dB. Similarly, by extending a heterogeneous ANN method, Yu et al. [10] proposed a transfer learning model to improve the accuracy of lightpath QoT prediction at a low complexity. Other studies on the ML-based lightpath QoT prediction can also be found in [11][12][13][14].
Meanwhile, research effort has been made to reduce QoT margin reserved for a lightpath such that more efficient spectrum utilization can be achieved [2,15]. Here, because the ML-based methods are expected to achieve more accurate lightpath QoT prediction, they are often employed to set margins for lightpaths. Seve et al. [16] proposed a generic learning process to predict QoT lightpath more accurately, which can reduce reserved margin for several dB. D'Amico et al. [17] employed a deep neural network to predict lightpath QoT, which helps reduce the reserved margin from 2.28 dB to 0.15 dB. Similarly, Lu et al. [18] explored the potential benefit in expanding network capacity when a more accurate lightpath QoT predictor is employed to reduce the reserved margin.
Although the ML-based models can predict lightpath QoT more accurately and therefore help reduce reserved QoT margins, the datasets used by these models are mainly from two sources, i.e., laboratory datasets and synthetic datasets [14]. For example, Gao et al. [9] collected datasets for ML from a 563.4-km field-trial testbed. D'Amico et al. [17] collected datasets for ML from an optical line system (OLS) that cascades 11 erbium-doped fiber amplifiers (EDFAs). In addition, two popular simulation tools, i.e., E-Tool [7] and GNPy [19], were often employed to generate synthetic datasets. Note that E-Tool is a BER estimation tool, and GNPy is an open-source library, developed based on the generalized GN model. For example, Rottondi et al. [7] employed E-Tool to generate synthetic datasets based on different simulation parameters. Khan et al. [20] employed GNPy to generate synthetic datasets. Because the datasets are either from laboratories or synthetic, not from real optical networks, the ML-based QoT prediction models cannot guarantee its accuracy when used to predict the QoT of a lightpath provisioned in a real network. This is actually confirmed by several studies. For example, Fan et al. [18,21] confirmed that the ML-based QoT prediction model becomes less stable with time because of system temporal drifts caused by temperature variations and other unpredictable factors. As a result, the prediction model may fail to predict the QoT of a lightpath and affect the availability of the lightpath. Therefore, for lightpath provisioning based on the ML-based QoT prediction model, we need to consider a protection mechanism against the failure of the prediction model.
Based on 1+1 path protection, Klinkowski et al. [23] developed an integer linear programing (ILP) model and an adaptive frequency assignment (AFA) algorithm to solve the RSA problem in an EON. They also developed an evolutionary algorithm to find an optimal solution to the RSA problem in a survivable EON with 1+1 path protection [24]. As an extension to Klinkowski's work, Goścień et al. [25] presented an ILP model to solve the RSA problem that jointly considers anycast and unicast lightpath services, and proposed a heuristic algorithm AFA-JAU-DPP to solve the problem.
Based on SBPP, Shen et al. [31] formulated the RSA problem in an EON into an ILP model and showed that SBPP demonstrates better spare capacity efficiency than 1+1 path protection. Walkowiak et al. [32] proposed an AFA/SBPP algorithm for the RSA problem in an SBPP-protected EON and verified the efficiency of the proposed algorithm in comparison with other heuristic schemes. Also, to maximize spare capacity sharing among multiple protection lightpaths, Wang et al. [33] proposed a spectrum window plane (SWP) based algorithm for survivable lightpath service provisioning in a distance-adaptive EON, which achieves better performance than the conventional shortest and K-shortest path routing algorithms.

C. Summary
In summary, we note that, although extensive studies have been conducted to develop various ML-based QoT prediction models to reduce margin reserved for a lightpath, no studies have considered protecting lightpath services against the failure these models. Meanwhile, although extensive studies have been conducted for network protection, no studies have considered employing these techniques to protect lightpath services against the failure of these ML-based QoT prediction models. Therefore, to the best of our knowledge, this is the first work to employ the network protection techniques, i.e., 1+1 path protection and SBPP, to protect against such a kind of failure so as to guarantee the availability of lightpath services provisioned based on the ML-based techniques.

OSNR MARGIN RESERVATION
When establishing a new lightpath, we need to first estimate its OSNR and then select the most efficient modulation format based on its OSNR and reserved margin. We next introduce the employed OSNR estimation model and two OSNR margin reservation methods: the traditional reservation method and the ML-based QoT prediction method.

A. OSNR Estimation Model
To evaluate the QoT of a lightpath, an OSNR estimation model is required, which generally considers two key impairments: amplified spontaneous emission (ASE) noise and non-linear interference (NLI). The equation for calculating the OSNR of a lightpath is as follows Here, each link traversed by a lightpath is assumed to be transparent and homogenous, i.e., the link consists of identical amplification spans and the loss of each span is exactly compensated by each optical amplifier. ℎ is the launch power of the lightpath.
is the power of ASE noise accumulated by all the optical amplifiers on the ℎ fiber link along the lightpath.
is the cumulative NLI power on the ℎ fiber link. is the number of links traversed by the lightpath.
To calculate the ASE noise, we first find noise figure (NF) for each amplifier by looking up a gain-NF table, pre-built based on different amplifier types. As in [35], we specifically consider two types of erbium doped fiber amplifiers (EDFAs) whose maximum gains are 15 dB and 22 dB, respectively. We select the amplifier type based on a required gain, which can just compensate loss accumulated before the amplifier. Specifically, if the required gain is below 15 dB, we select a 15-dB EDFA because it has a lower NF; otherwise, we select a 22-dB EDFA. After deciding the amplifier type, we use the gain-NF table to find an NF based on the required gain. Finally, we employ Eqs. (1) and (2) in [35] to calculate the ASE noise accumulated by all the EDFAs on a fiber link.
For the NLI power, we implement an exact Gaussian-noise (GN) model as in GNPy [19] with an analytical approximation [4], in which only self-channel interference (SCI) and cross-channel interference (XCI) are considered, with multi-channel interference neglected. Under the assumptions of incoherent accumulation of NLI noise and lumped amplification of optical amplifiers, we employ Eqs. (120) and (121) in [36] to calculate the NLI power accumulated along a traversed fiber link. This approximate model has been verified to be accurate with a computational complexity compatible to real-time network operation [4].

B. OSNR Margin Reservation
Different types of OSNR margins need to be reserved for a lightpath, including system margin (S-margin), unallocated margin (U-margin), and design margin (D-margin) [2]. The S-margin accounts for time-varying network operating conditions, including fast time-varying impairments (e.g., polarization effects) and slow time-varying impairments (e.g., additional nonlinearities and network equipment aging). The U-margin is referred to as the difference of capacity/reach between the demand and the discrete data rate/reach granularity offered by commercial transmission equipment. The D-margin is the difference between the planned beginning of life (BoL) value and the real value of the quality metric, which is due to the inaccuracy of the design tool used to evaluate the QoT of a lightpath during network planning. With these margins reserved, optical communication infrastructure can ensure all the lightpaths to maintain acceptable QoTs until the end of their lives.  As shown in Fig. 1, at the BoL of a lightpath, the actual OSNR is the sum of a forward error correction (FEC) limit and the total margin reserved, including the S-margin, the D-margin, and the U-margin. In this study, the FEC limit is set to be the OSNR threshold, above which the lightpath signal is deemed recoverable "error-free." This setting can ensure the lightpath with the best transmission quality and therefore the probability of failing to meet the OSNR threshold is the lowest. However, with time going, the signal quality of a lightpath will gradually deteriorate due to slow time-varying impairments. When approaching its end of life (EoL), the actual OSNR of a lightpath may be lower than the sum of the FEC limit and the fast time-varying system margin. If this occurs, then the lightpath will fail in data transmission.
We define the difference between the margins at the BoL and at the EoL as the consumable margin (CM), which is the sum of all the margins excluding the fast time-varying system effect , i.e., = − . Based on this CM, we can estimate the failure probability of the lightpath based on a Gaussian distribution model shown in Fig. 2. It is evident that when the CM is large (e.g., greater than 8.0 dB), the failure probability of the lightpath (i.e., the shadow area, which is the integral of the failure probability function) is low, close to zero, while when the CM margin is small (e.g., close to 0 dB), the failure probability of the lightpath is high, close to one.

C. Margin Setting
Based on the above different margins, we next describe two margin reservation methods: the traditional method and the ML-based QoT prediction method.

1) Traditional method
In this method, the traditional OSNR estimation model introduced in Section III. A is employed to predict the OSNR of a lightpath, in which the D-margin is reserved to guarantee the QoT of a lightpath to be higher than a predefined threshold [5]. Table I shows typical margin settings in the traditional method, in which a 2-dB D-margin is reserved at the BoL. In addition, to guarantee the reliability of a lightpath service over the full network life, the S-margin is reserved. For example, as in [2], we may reserve a 0.4-dB, 2-dB, and 2.3-dB OSNR margin for the fast time-varying effect, the additional nonlinearity, and the slow aging impairment, respectively, which corresponds to a total 4.7-dB S-margin at the BoL.
Considering the S and D-margins (i.e., 6.7-dB) at the BoL of a lightpath, we can select the most efficient modulation format according to the following inequation.
where is the OSNR threshold required by the selected modulation format. Because a fiber-optic transmission system often has discrete data rate/reach granularities as shown in Table II, where the spectrum efficiency of each modulation format is discrete, there exists the difference of capacity/reach between the demand and the really offered by the system. Thus, after selecting the modulation format for each lightpath, we can further calculate its U-margin as follows.
With all the margins determined, we can finally calculate the consumable margin of the traditional method, which is the difference between margins at the BoL and EoL. Here, as in Section III. B, the CM is calculated as = − = 6.7 + − 0.4 = 6.3 + . Based on the found CM, we can further estimate the failure probability of the lightpath using the curve in Fig. 2.

2) ML-based QoT prediction
In this method, we employ the ML-based model to predict the OSNR of a lightpath. To build an accurate QoT prediction model, we need massive data for training. Because of difficulty in collecting massive real-network data, we simulate to generate the data, in which the OSNRs of all the routes are estimated based on the OSNR estimation model in Section III. A. Specifically, for the ASE noise, since the NF values were obtained from real amplifiers, the data calculated based on the model is considered accurate. However, due to the inaccuracy in estimating NLI noise, we add an estimation error, which follows a Gaussian distribution within the range of 0.3 dB [4]. This error can simulate the statistical feature of a network and therefore can increase the accuracy of the NLI noise. We use the simulated data as training data to obtain the ML-based QoT prediction model. Note that although we employ simulation data for this study, it does not affect its effectiveness since we can always use the actual system data to replace the simulated data if sufficient actual system data can be obtained.
We employ an artificial neural network (ANN) in [38] to train the ML-based QoT prediction model. This ANN consists of an input layer, a hidden layer, and an output layer. In the input layer, we configure seven neurons, which respectively correspond to the following seven lightpath features: (1) the total number of traversed hops, (2) total length, (3) the length of the longest link, (4) total ASE noise, (5) total NLI noise, (6) the number of traversed 15-dB EDFAs, (7) the number of traversed 22-dB EDFAs. In the hidden layer, we configure 10 neurons, each of which uses a rectified linear unit (ReLU) function. In the output layer, the output neuron represents the predicted OSNR value of a lightpath and its activation is a linear function. By training the ANN, we obtain an ML-based QoT prediction model and use it to predict lightpath OSNR.
The ML-based QoT prediction is expected to predict lightpath QoT more accurately, and therefore significantly reduce the reserved D-margin. In this study, we set the D-margin to be zero. In addition, following the traditional method, we still set the S-margin to be 4.7 dB at the BoL. Based on a predicted lightpath OSNR and the reserved S-margin, we can select an efficient modulation format using the following equation.  Fig. 2.
We make a brief comparison between the two margin reservation methods. Due to the 2-dB D-margin reservation, the traditional method increases the CM, resulting in a lower lightpath failure probability. However, this method consumes more spectrum resources. In contrast, without the D-margin reservation, the ML-based QoT prediction method can efficiently use spectrum resources. However, due to a smaller CM, the failure probability of a lightpath significantly increases.

A. Protection Mechanisms
As the ML-based QoT prediction method does not reserve the D-margin, a lower margin is reserved for each lightpath compared to the traditional method. This enables the ML-based method to adopt more advanced modulation formats given the same FEC limits. However, as a disadvantage, the reduction of the reserved margin can degrade the availability of a lightpath (according to Fig. 2). In case that a lightpath fails to meet its required OSNR, its data transmission will be affected. Therefore, a protection mechanism is required to protect against this type of failure. In this study, we propose to employ the traditional network protection technique to pre-plan a backup lightpath, which however reserves its OSNR margin based on the traditional method. With this protection lightpath, we can always recover a working lightpath when it fails to meet its required OSNR. We specifically employ 1+1 path protection and SBPP as two protection techniques for this study. Different from the conventional network protection, while ensuring lightpath service recovery in case of working lightpath failure, the proposed scheme needs to reserve a lower margin by applying the ML-based QoT prediction method for the working lightpath, thereby improving spectrum resource utilization. Fig. 3 shows examples based on the two network protection techniques against the failure of ML-based QoT prediction. We assume that between node pairs (A, B) and (A, E), there are a 180-Gb/s traffic demand and a 220-Gb/s traffic demand, respectively. Table III summarizes how much network resource is required and how the failure probability will be under the different margin reservation methods and the different network protection techniques.
In Fig. 3, for the first lightpath service, we employ the shortest route (A-B) to establish its working lightpath and the , which can further derive the total margin of the lightpath to be 8.67 dB. This corresponds to an 8.27-dB CM with a 0.4-dB fast time-varying system margin. According to Fig. 2, it is easy to find that the failure probability of this working lightpath is close to 0 at the BoL.
As a comparison, if we employ the ML-based QoT prediction method, the OSNR of working lightpath (A-B) can be predicted to be 21.03 dB (for example). Since in this method, the D-margin is no longer reserved, there is only a 4.7-dB S-margin. According to (4), we can find + = 16. 33 . Then, according to . This corresponds to a 5.5-dB CM. According to Fig. 2, we can find that the failure probability of this working lightpath is close to 7% at the BoL. The same calculation can be conducted for the second lightpath service between the node pair (A, E). The number of FSs used and the failure probability of the working lightpath are shown in Table III for the two margin reservation methods, respectively. It is noted that for both lightpath services, the failure probability of the ML-based QoT prediction method is significantly higher than that of the traditional method although the former uses less spectrum resource. Therefore, we need to establish a protection lightpath to improve the availability of the lightpath services. For the protection lightpath, we employ the traditional method to reserve the OSNR margin such that there is always a reliable protection lightpath even if the working lightpath fails in its OSNR. In Fig. 3, we also show examples of setting up protection lightpaths under 1+1 path protection and SBPP techniques. The traditional margin reservation method is employed for guaranteed availability. Table III shows the numbers of FSs used and the failure probabilities of the protection lightpaths under the two protection techniques. Because of spare capacity sharing, SBPP uses fewer FSs on link (A-C) than 1+1 path protection, i.e., 5 vs. 9.
The advantage of combining the two margin reservation methods for the working and protection lightpaths is as follow. In the most time when there is no OSNR failure, we can enjoy efficient spectrum utilization of the working lightpath provided by the ML-based QoT prediction method, thereby improving network resource utilization. Meanwhile, to tackle the issue of insufficient availability due to the ML-based QoT prediction method, we reserve spectrum resource for a protection lightpath in a conservative way such that the lightpath service will be always available even if the working lightpath incurs a failure of OSNR prediction. Therefore, the proposed scheme can not only enjoy the high spectrum efficiency of working lightpaths, but also ensure the availability of lightpath services.

B. Estimating Availability of a Protected Lightpath Service
To protect against the failure of the ML-based QoT prediction, we consider the two protection techniques, i.e., 1+1 path protection and SBPP. These two techniques can achieve different availabilities for each lightpath service in case of lightpath QoT failure 1 . Next, we describe how to calculate the lightpath service availability under these two protection techniques.
Under 1+1 path protection, both working and protection lightpaths are assigned with dedicated spectrum resource. When a working lightpath fails, its corresponding protection lightpath immediately restores affected traffic demand. Thus, for lightpath service A, its availability considering the availability of either working or protection lightpath can be calculated as 1 Note that in this study, we do not consider service unavailability due to link/node failures, but focus on unavailability due to lightpath QoT failure. Under SBPP, the availability of a lightpath service needs to consider not only its own working and protection lightpaths, but also the availability of the other lightpath services that share common links with the current lightpath service. Here, we assume that the set of lightpath services that share links with lightpath service A is ℛ{ 1 , 2 ••• }. According to SBPP, only when the working lightpaths of all the services in ℛ are in the state of success, the spectrum resource of the protection lightpath of A can be used to recover the failure of the working lightpath of A. Therefore, the availability of the current lightpath service can be calculated as where is the failure probability of the working lightpath of the ℎ lightpath service in ℛ. For the examples in Section IV. A, we use the above availability estimation model to calculate the availabilities of different lightpath services, which are shown in Table IV. We first consider lightpath service 1. Under 1+1 path protection, the failure probabilities of its working and protection lightpaths are 7% and zero at the BoL, respectively. Therefore, according to (5), the availability of this lightpath service is 100% at the BoL. Similarly, for lightpath service 2, we can also find that its availability is 100% at the BoL. In contrast, under SBPP, we use (6) to calculate the availability of lightpath service 1, which is 94.05% when considering the failure probability of working lightpath 2. Similarly, this availability for lightpath service 2 is 86.05% when considering the failure probability of working lightpath 1. It is evident that 1+1 path protection can achieve a higher service availability than SBPP while the latter can achieve a more efficient spectrum utilization.

SIMULATIONS AND PERFORMANCE ANALYSES
To evaluate the performance of the proposed protection mechanism against the failure of ML-based QoT prediction, we consider two test networks, the 14-node, 21-link NSFNET network and 24-node, 43-link USNET network (shown in Fig.  4). There are 320 FSs on each fiber link in both test networks. The bandwidth granularity of each FS is assumed to be 12.5 GHz, and six modulation formats (i.e., BPSK, QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM) are used for working and protection lightpath establishment. A static traffic demand is also assumed, where the traffic demand between each node pair is assumed to be uniformly distributed in the range of [100, X] Gb/s, where X is the maximum traffic demand. In this study, we set X to be 600 and 200 for NSFNET and USNET, respectively. In addition, according to our previous studies, the performance of a service provisioning algorithm is closely related to the sequence of served demands [35,39]. Therefore, we shuffle the list of lightpath services 200 times, and then for each shuffled lightpath service sequence, we run the service provisioning algorithm to find a result with minimum number of FSs used.
We employ the shortest path routing algorithm and the first-fit (FF) spectrum assignment strategy to establish a working lightpath. For the protection lightpath, we aim to maximize the spare capacity sharing efficiency especially for SBPP and therefore employ more efficient spectrum window plane (SWP)-based routing and spectrum assignment (RSA) algorithm [33,39]. The routes of working and protection lightpaths must be link disjoint and their assigned spectra are subject to the constraints of spectrum continuity and spectrum contiguity [33].
We consider three different lightpath establishment schemes. The first reserves OSNR margins for both working and protection lightpaths based on the traditional method, in which the D-margin is specially reserved. The second reserves the margin for the working lightpath based on the ML-based QoT prediction method, and for the protection lightpath based on the traditional method. The third reserves the margins for both working and protection lightpaths based on the ML-based QoT prediction method.

A. Number of FSs Used
We first compare the maximum number of FSs used for the three schemes under 1+1 protection and SBPP, respectively. Fig. 5 shows related results when all the lightpath services are provisioned, in which "W" represents the working lightpath, "P" represents the protection lightpath, "ML" represents the ML-based QoT prediction method, and "trad" represents the traditional method. Fig. 5 compares the results of the different schemes under 1+1 path protection and SBPP. For the 1+1 path protection, we see that the scheme of "W: trad_P: trad" has the largest number of FSs used among the three schemes. This is because this scheme reserves OSNR margins based on the traditional method for both working and protection lightpaths, thereby requiring more spectrum resources to accommodate traffic demands. In contrast, the scheme of "W: ML_P: ML" has the smallest number of FSs used. This is because this scheme employs the ML-based QoT prediction method for both working and protection lightpaths, and therefore, allocate spectrum resources most efficiently. Finally, the scheme of "W: ML_P: trad" falls in the middle because it employs the ML-based QoT prediction method for the working lightpath and the traditional margin reservation method for the protection lightpath. Moreover, comparing the schemes of "W: ML_P: ML" and "W: ML_P: trad," we note that the differences in the number of FSs used seem not significant, which are 9.3% vs. 10.0% and 13.3% vs. 15.7% reduction from the most conservative scheme for NSFNET and USNET, respectively. For SBPP, we note that like 1+1 path protection, the scheme of "W: trad_P: trad" still has the largest number of FSs used and the scheme of "W: ML_P: ML" has the smallest number of FSs used because larger OSNR margins need to consume more spectrum resources. The scheme of "W: ML_P: trad" falls in the middle, and its difference from "W: ML_P: ML" in the number of FSs used is however small, which are 14.3% vs. 18.0% and 16.2% vs. 19.9% reduction from the most conservative scheme for NSFNET and USNET, respectively.
Finally, comparing the results of the two protection techniques, we see that SBPP always has a smaller number of FSs used than 1+1 path protection because of efficient spare capacity sharing. Specifically, for the scheme of "W: ML_P: trad," the reductions are up to 29.8% and 10.8% for NSFNET and USNET, respectively.

B. Service Availability
We evaluate service availability for the three schemes under the two protection techniques. Here, we only focus on service unavailability due to the failure of ML-based QoT prediction, but do not consider service unavailability due to network node/link failure. The results are shown in Fig. 6, in which "1+1_W: trad_P: trad" corresponds to the scheme of "W: trad_P: trad" based on 1+1 path protection and "SBPP_W: trad_P: trad" correspond to the scheme of "W: trad_P: trad" based on SBPP.

1) 1+1 path protection
We first analyze the service availability of 1+1 path protection. We note that the scheme of "W: trad_P: trad" always achieves the highest service availability because it employs the traditional margin reservation method and therefore shows the highest availability of lightpaths. In contrast, because the scheme of "W: ML_P: ML" employs the most advanced ML-based prediction method and therefore reserves the lowest OSNR margin for the working and protection lightpaths, it demonstrates the lowest service availability. The scheme of "W: ML_P: trad" falls in the middle because although the working lightpath uses the "unreliable" ML-based QoT prediction method, the protection lightpath can still provide protection in case of failure and it reserves margins based on the traditional method. Moreover, under 1+1 path protection, although the schemes of "W: ML_P: trad" and "W: ML_P: ML" have close numbers of FSs used, the former achieves a much higher service availability than the latter. Specifically, in NSFNET, the reduction of service availability by "W: ML_P: ML" relative to "W: trad_P: trad" is up to 74.5%, while such a reduction is only 16.3% by the scheme of "W: ML_P: trad." Similarly, in USNET, these values are 65.1% vs. 21.9%, respectively. Thus, we can conclude that the mechanism of protecting against the failure of ML-based QoT prediction by establishing a dedicated protection lightpath based on the traditional margin reservation method is effective to significantly enhance the availability of a lightpath service, while not significantly increasing spectrum resource used.

2) SBPP
We also analyze the service availabilities for the different schemes under the SBPP technique. In Fig. 6, we see that the scheme of "W: trad_P: trad" still achieves the highest service availability due to its higher OSNR margins reserved for both working and protection lightpaths. In contrast, the scheme of "W: ML_P: ML" shows the lowest service availability due to its lower margin reserved by the ML-based QoT prediction method, which leads to higher failure probabilities of all the established lightpaths. As an intermediate case, the scheme of "W: ML_P: trad" achieves an intermediate availability between the previous two. However, different from the results of 1+1 path protection, the service availability of this scheme is not close to "W: trad_P: trad," but close to "W: ML_P: ML." Specifically, in NSFNET, the reductions of service availability by "W: ML_P: trad" and "W: ML_P: ML" relative to "W: trad_P: trad" are up to 53.8% and 59.8%, respectively. In USNET, these values are 57.4% and 61.6%, respectively. Thus, under SBPP, the scheme of "W: ML_P: trad" cannot continue ensuring a high service availability as under 1+1 path protection; rather, availability is significantly scarified for efficient spectrum resource utilization.
We also compare the service availability of SBPP with that of 1+1 path protection. In Fig. 6, we note that compared with 1+1 path protection, all the schemes under SBPP show lower service availabilities. This is because SBPP allows spare capacity sharing among protection lightpaths, which degrades the service availability. Specifically, when a working lightpath fails, 1+1 path protection can immediately restore the lightpath service through a dedicated protection lightpath. In contrast, under SBPP, the restoration of a lightpath service depends on the status of other working lightpaths that share common protection resources with the current working lightpath. Only if there is no competition for the shared protection resources, can the current lightpath service be restored using its protection lightpath. This restoration condition therefore degrades the service availability. In Fig. 6, we note that the largest difference of service availability between SBPP and 1+1 path protection occurs in the scheme of "W: ML_P: trad," which is up to 62.1% and 58.5% in NSFNET and USNET, respectively. This is because in this scheme, all the working lightpaths employ the ML-based QoT prediction method, which has a low service availability, and moreover, the protection lightpath under SBPP also has a lower availability than that under 1+1 path protection. In summary, although SBPP requires less spectrum resource for lightpath establishment, it suffers from a lower service availability compared with 1+1 path protection.

C. Availability Enhancement for SBPP
Based on the above results, we understand that SBPP enables more efficient spectrum resource utilization, which is however at the cost of a lower service availability. To overcome this disadvantage, we need to enhance the service availability of SBPP. For this, in the context of the "W: ML_P: trad" scheme, we specifically propose two availability-enhancing approaches for SBPP, which adds constraints on the number of protection lightpaths that can share common spectrum resources and on the margin reduction by each working lightpath when the ML-based QoT prediction method is employed.

1) Constraint on number of sharing protection lightpaths
To enhance the service availability for SBPP, we set a constraint on the number of protection lightpaths that can share the same spectrum resources on their common traversed link(s). Based on the scheme of "W: ML_P: trad," we evaluate how this constraint can impact the number of FSs used and service availability in comparison with the original 1+1 path protection and SBPP techniques. Fig. 7 shows the maximum number of FSs used, in which the numbers of shared protection lightpaths are set be to 2, 3, and 5, and therefore, the legend "SBPP_X" means SBPP subject to maximum X protection lightpaths allowed to share common spectrum resources. We can see that the constrained SBPP shows larger numbers of FSs used compared with the conventional SBPP. This is because the constraint on the number of sharing protection lightpaths will cause SBPP to lose some opportunity in spare capacity sharing, thereby requiring more spectrum resources reserved for pre-planned protection lightpaths. Moreover, with the decrease of sharing protection lightpaths, a larger number of FSs are used. Specifically, the scheme of "SBPP_2" shows the largest number of FSs used, and increases 27.3% and 5.2% in the number of FSs used compared with the conventional SBPP for NSFNET and USNET, respectively. However, compared with 1+1 path protection, "SBPP_2" still shows a smaller number of FSs used, which is 10.6% and 6.2% reduction for NSFNET and USNET, respectively. This is because "SBPP_2" still allows protection lightpaths to share spare capacity, thereby improving spectrum resource utilization over 1+1 path protection. Next, we evaluate how the service availability will be enhanced when the number of protection lightpaths that are allowed to share common spectrum resources is limited. Fig. 8 show the service availabilities of the different schemes. We see

5.2%
that compared with the conventional SBPP technique, the constrained SBPP demonstrates a higher service availability. The smaller the number of sharing protection lightpaths is, the more service availability is enhanced. Specifically, "SBPP_2" shows the largest enhancement in the service availability, and compared with the conventional SBPP, this enhancement is up to 19.6% and 20.2% for NSFNET and USNET, respectively. This means that the constraint on the number of sharing protection lightpaths does help enhance service availability. Furthermore, compared with 1+1 path protection, the service availability of "SBPP_2" is still poorer, with up to 51.6% and 44.3% lower for NSFNET and USNET, respectively. Again, this is because although the number of sharing protection lightpaths is limited to 2, it is still subject to the protection resource competition from the second lightpath service whose protection lightpath shares common spectrum resources with the first one. In summary, although limiting the number of sharing protection lightpaths can help SBPP enhance lightpath service availability, it still cannot achieve an availability close to 1+1 path protection. In some scenario, if a high service availability is required, 1+1 path protection is still needed.

2) Constraint on D-margin reserved for working lightpath
To further increase the service availability under SBPP, we increase the D-margin reserved for each working lightpath in the context of the ML-based QoT prediction method. For performance evaluation, we consider the scheme of "W: ML_P: trad" and increase the D-margin of each working lightpath to be 0.5, 1.0, and 1.5 dB.
We first compare the maximum number of FSs used under different D-margins reserved for each working lightpath. In Fig.  9, we see that with the increase of D-margin, more spectrum resources are needed to provision all the lightpath services.
When the D-margin is set to be below 1.5 dB, "W: ML_P: trad" has a smaller number of FSs used than "W: trad _P: trad", and when the D-margin is set to be 1.5 dB, the two schemes have close numbers of FSs used. Moreover, when the D-margin is set to be 0.5 dB, the number of FSs used is the same as that of zero-margin scheme in NSFNET, and the corresponding value is 0.15 dB in USNET. This means that setting a zero D-margin will not help much in network capacity utilization, but it scarifies service availability. Thus, an optimal D-margin should be considered to maximize service availability while reducing a minimum number of FSs used. Next, we compare the service availability under different D-margins. In Fig. 10, "W: ML_P: trad (X)" corresponds to the scheme of "W: ML_P: trad" with X-dB D-margin reserved for each working lightpath. We can see that the increase of D-margin enhances the service availability. Specifically, when the D-margin is below 1.5 dB, the service availability is low. However, when the D-margin reaches 1.5 dB, the service availability enhances significantly, close to the most reliable scenario, i.e., "W: trad _P: trad." Based on the above performance analyses, we therefore can conclude that for SBPP, "W: ML_P: trad" can achieve a higher service availability at the cost of more spectrum resources used. To protect against the failure of ML-based QoT prediction under SBPP, we need to increase the D-margin to be more than 1.5 dB for the ML-based QoT prediction method, which can still reduce 0.5-dB D-margin compared with the traditional reservation method.

CONCLUSION
Employing ML-based techniques to predict the QoT of a lightpath can reduce margins reserved for the lightpath and improve spectrum resource utilization. This however would cause lightpath service failure if the ML-based QoT prediction model fails to accurately predict the actual QoT of a lightpath. To protect against this type of failure, we propose to take advantage of 1+1 path protection and SBPP that are popularly implemented in an optical network and reuse their protection resources for service recovery. To verify the efficiency of the proposed protection mechanism, we evaluate the maximum number of FSs used and service availability under the different protection schemes. Simulation studies showed that under 1+1 path protection, an optimal performance tradeoff can be achieved if a working lightpath is established based on the ML-based QoT prediction method and its corresponding protection lightpath is established based on the traditional margin reservation method, which corresponds to a 2-dB D-margin reduction for the working lightpath. This configuration can not only save spectrum resources used but also guarantee a high service availability. Under SBPP, although the ML-based QoT prediction method can significantly save network spectrum resources used, this would significantly scarify the service availability. Therefore, to enhance the service availability, we proposed to limit the number of protection lightpaths that can share common protection resources and set an optimal D-margin to balance spectrum efficiency and service availability. It was found that although not significant, an up to 0.5-dB D-margin can be saved for the working lightpath by the ML-based QoT prediction method while guaranteeing its service availability to be close to the traditional method.