Semi-Supervised Machine Learning Aided Anomaly Detection Method in Cellular Networks

The ever-increasing amount of data in cellular networks poses challenges for network operators to monitor the quality of experience (QoE). Traditional key quality indicators (KQIs)-based hard decision methods are difficult to undertake the task of QoE anomaly detection in the case of big data. To solve this problem, in this paper, we propose a KQIs-based QoE anomaly detection framework using semi-supervised machine learning algorithm, i.e., iterative positive sample aided one-class support vector machine (IPS-OCSVM). There are four steps for realizing the proposed method while the key step is combining machine learning with the network operator's expert knowledge using OCSVM. Our proposed IPS-OCSVM framework realizes QoE anomaly detection through soft decision and can easily fine-tune the anomaly detection ability on demand. Moreover, we prove that the fluctuation of KQIs thresholds based on expert knowledge has a limited impact on the result of anomaly detection. Finally, experiment results are given to confirm the proposed IPS-OCSVM framework for QoE anomaly detection in cellular networks.


I. INTRODUCTION
F IFTH generation (5G) wireless communication has three typical applications, i.e., enhanced mobile broadband, ultra reliable low latency communications, and massive machine type communications [1]. According to the prediction of Cisco white paper, 5G will provide enterprisers and consumers with a large number of new applications, including anytime and anywhere video, real-time communication, super reliable communication, high-density and large bandwidth access, high-speed mobile access, augmented reality, virtual reality, super large scale internet of things [2]- [11]. However, the advent of the 5G era has brought about an explosive growth in wireless data traffic, while also placing higher demands on network operators to ensure users' experience.
The users' experience on the services provided by the operator is defined as quality of experience (QoE) [12], [13]. Since QoE is user-centered and subjective, it is difficult to perform accurate measurement. As an alternative way, measurable key quality indicators (KQIs) are designed for estimating QoE. KQIs can be quantified and are service based. The QoE of a service can be defined by several related KQIs for that service. Therefore, if QoE of a service is abnormal, it will be reflected in the corresponding KQIs. Big data on mobile networks brings opportunities and challenges to network optimization. At this stage, using KQIs data to monitor QoE is a common method used by the network operator. Although massive amounts of data make network optimization work difficult, we can use data-driven anomaly detection on this issue. Our research is based on KQIs data to realize anomaly detection of the corresponding QoE. Once abnormal QoE is detected, network operator can optimize the wireless network in time to achieve optimization before complaint. Undoubtedly, it is uneconomical and difficult to focus on each user's QoE. We are more concerned about the average QoE of users accessing the same cell. For example, a simple idea is to calculate the mean KQIs of users accessing the same cell to represent the cell's KQIs, and then use the cell's KQIs to determine the cell's QoE.
Traditional QoE anomaly detection method makes hard decision through thresholds and certain rules [14]. This will bring about a series of problems, which will be explained in Section II. Nowadays, both machine learning and deep learning are widely used in wireless communications [15]- [28] and internet of things [29]- [32]. Also a lot of related work have done for anomaly detection in cellular networks. For example, isolation forest (IF) [33] and local outlier factor (LOF) [34] are commonly used as unsupervised methods, and stacked denoising autoencoders (SDAE) [35] and one-class support vector machine (OCSVM) [36], [37] are commonly used as semi-supervised methods for anomaly detection. Furthermore, the one-against-one SVM is also used is different applications, such as eHealth [38], voice processing [39] and remote 0018-9545 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information. sensing [40]. It is worth mentioning that OCSVM is one of the most popular machine learning methods for anomaly detection since the algorithm uses only positive samples to train the classifier. Many literatures have shown that OCSVM has good applicability to various anomaly detection problems, such as intrusion detection [41], industrial monitoring [42], and data diagnosis in wireless sensor networks [43].
In this paper, we propose an iterative positive sample aided OCSVM (IPS-OCSVM) framework to realize real-time QoE anomaly detection. Our main contributions include the following: i) Based on KQIs data, our proposed IPS-OCSVM framework can realize real-time QoE anomaly detection through soft decision. ii) We take full advantage of the network operator's expert knowledge on KQIs thresholds, and prove our framework is robust and can withstand slight fluctuations in KQIs thresholds. iii) The proposed IPS-OCSVM framework can adjust the ability of anomaly detection under specific requirements, and it is very convenient for adjustment. The remainder of this paper is organized as follows. In Section II, we discuss related work about QoE anomaly detection in cellular networks. Section III gives the problem formulation. Section IV proposes the IPS-OCSVM framework. In Section V, we analyze the performance of IPS-OCSVM. Finally, we conclude our paper in Section VI.

II. RELATED WORK
QoE anomaly detection in cellular networks has long been one of the most concerned issues for network operators. Most of the research on this issue focuses on studying the impact of network performance on QoE and the end-to-end experience as a whole. Here we first discuss the traditional KQIs-based hard decision, which is also the method that the network operator often uses. There are three main steps in traditional method: 1) Sets KQIs thresholds and makes rules; 2) Screens out bad indicators for KQIs samples, and 3) Detects KQIs samples leading to bad QoE according to the rules. For the convenience of description, we call the value of an indicator of a KQIs sample does not meet the threshold requirement as a bad indicator. Fig. 1 gives a simple example of deciding whether the QoE of web browsing service is abnormal.
The traditional KQIs-based hard decision making is very simple and convenient, however, there are often problems in actual use. Firstly, KQIs are not of the same importance. It is inaccurate to judge QoE as abnormal by the number of bad indicators. In additional, KQIs thresholds are based on network operators' expert knowledge which reflect the statistical KQIs situation of most cells, and it is difficult to adjust anomaly detection ability under different scenarios and different requirements. For example, user-intensive areas are often more difficult to secure QoE, where more radio resources are need to be allocated, so our anomaly detection methods need to be more sensitive to QoE anomalies in these areas [13], which is a challenging task for traditional KQIs-based hard decision method.
In [44], a three-layer mapping model from KQIs to QoE and a KQIs five-level scoring system are established for data services. Compared with the traditional hard decision method, the three-layer mapping model can better reflect the KQIs and QoE situation. The five-level scoring method provides a more detailed qualitative description of the KQIs performance. However, the traffic proportion of sub-services is time-varying, and it is not very reasonable to determine the weights of sub-services through the traffic proportion. Besides, as the complexity of the network increases, it is challenging to manually determine the exact thresholds for different KQIs in different scenarios. When the network operator finds that the result of the anomaly detection is slightly different from the actual situation and needs to fine-tune the ability of anomaly detection, then multiple thresholds need to be adjusted. In [14], an object-oriented detection framework with a two-step clustering, named as Hourglass Clustering, is used to provide anomaly codebook for QoE anomaly detection. Hourglass Clustering is a two-step clustering with dimension reducing and dimension expanding, which uses feature matching to achieve anomaly detection. Hourglass Clustering can capture the characteristics of anomalous samples, but it is also difficult to adjust the ability of QoE anomaly detection as needed.
Compared with the three-layer mapping model and Hourglass Clustering, our proposed IPS-OCSVM framework can make better use of the network operator's expert knowledge. Besides, IPS-OCSVM can adjust the ability of anomaly detection under specific requirements, and it is very convenient for adjustment.

III. PROBLEM FORMULATION
When users use the network services provided by the network operator, a large amount of KQIs data about the services will be recorded. For network operators, it does not make sense to focus on the QoE of a particular user. Network operators are more concerned with the QoE of a cluster of users, which makes network optimization more efficient and economical. Thus, we calculate the average KQIs of all users accessing a cell, and use this value to represent KQIs of the cell to estimate the cell's QoE. We use real 4 G data from a certain network operator, which recorded the synthesized data service KQIs. Data service types include streaming services, gaming services, instant messaging services, and web browsing services. The data has a total of 41838 valid KQIs samples from 324 different cells with 14 indicators and a time span of 7 days. The most important point is that we have reference threshold for each KQI, but we do not know if QoE of each KQI sample is good or bad. Fig. 2 illustrates a scenario for estimating QoE with KQIs data.
Regarding to the data set, we can get two features: large volume and high dimension. This paper will address following problems: r How to develop a meachin learning method to identify KQIs samples leading to abnormal QoE problem from massive data? r How to fine-tune the ability of QoE anomaly detection in an easy way for different scenarios and different requirements?
r How to use KQIs thresholds based on expert knowledge to guide QoE anomaly detection? The first problem is to identify KQI samples leading to abnormal QoE from massive data. Besides, when the anomaly detection results are slightly different from the actual situation, we hope that the results can be easily fine-tuned. In addition, the KQI threshold is the experience accumulated by network operators, and it is by no means meaningless and useless. How to use artificial experience to guide QoE anomaly detection is also a question worth considering.

IV. OUR PROPOSED IPS-OCSVM FRAMEWORK FOR KQIS-BASED QOE ANOMALY DETECTION
In this section, we will describe the entire steps of QoE anomaly detection using IPS-OCSVM, followed by data preprocessing, learning algorithm in IPS-OCSVM, fine-tuning the parameters as needed, and implementation from offline training to online detection. Fig. 3 shows the process of data preprocessing using KQIs thresholds based on expert knowledge. In order to take advantage of the network operator's experience in QoE anomaly detection, we draw on the network operator's KQIs threshold.s For each KQI sample, we calculate the number of bad indicators. Then we put KQIs samples with the same number of bad indicators into the same set. In this way, we divide the original large data set into several small data sets according to the number of bad indicators in KQIs samples. Fig. 4 shows the result of data preprocessing. D l represents a collection of samples with l bad indicators.

B. Learning Algorithm for the Proposed IPS-OCSVM Framework
We will first briefly introduce OCSVM as a foreshadowing of our proposed IPS-OCSVM framework. In order to transform one-class classification problems into special two-class classification problems, OCSVM projects the input space from the original dimensional space into a higher dimensional space to find a hyperplane that separates the projected examples from the origin with the maximum possible margin. The primal quadratic problem defining the OCSVM classifier is [45] P1 : min ω,ξ,ρ In (P1), x i is the sample in the original space and n is the number of training samples. Φ represents the map from the original space to the feature space, ω is the normal vector, and ρ is the offset of the desired hyperplane in the feature space. The slack variable ξ i allows certain training samples to be misclassified. The trade-off parameter ν ∈ (0, 1] proves to be the upper limit of the proportion of training samples that are classified outside the decision boundary, and a lower bound on the fraction of supporting vectors. The label of x can be predicted by f (x) [46], where the definition of decision function f (x) is By using Lagrange multipliers, (P1) is converted to the following quadratic dual problem (P2): Then the decision function with a kernel expansion is given as follows: All training samples corresponding nonzero α i are called support vectors. The kernel function K(x i , x j ) is designed to increase the positive sample dimension to find a hyperplane to separate them. The choice of kernel function in OCSVM is one of the key factors affecting the success of the algorithm. More details could be found in [45]. The learning algorithm During the initialization process, we set kernel function K(x i , x j ) = K 0 (x i , x j ) and ν = ν 0 . The role of a suitable kernel function and parameter ν will be described in Section V. Firstly, using D 0 as the original normal data to solve (P2) in OCSVM, the decision function f (x) can be obtained. Secondly, the decision function is used to distinguish the normal part N 2 from the abnormal part A 2 in D 2 , and the normal part N 2 in the D 2 is updated into the total normal data set N . By iterating according to this rule, we can get all the normal samples and the final decision function.

C. Adjust the Ability of QoE Anomaly Detection
It is one of the advantages of our method to adjust the anomaly detection ability as needed. The traditional hard decision method adjusts the QoE anomaly detection ability by adjusting the threshold. However, accurately adjusting thresholds for multiple indicators is a challenging task. Our method only needs to adjust the parameter ν to adjust the QoE anomaly detection ability based on the original threshold. Before actual use, we can compare different parameter values on a labeled data set under certain requirements to determine ν 0 . For example, the area under the receiver operating characteristic curve (AU C), is widely used to estimate the predictive accuracy of classifier. Besides, recall in Eq. (9) is the fraction of the total amount of relevant instances that were actually retrieved, also used to evaluate the performance of a classifier. We record the corresponding AU C and recall in different ν cases as A(ν) and r(ν). In order to obtain suitable adjustment ν for the classifier for achieving maximal AU C subjects to lower bound recall, then ν 0 can be given as The impact of parameter ν on performance will be explained in the performance analysis.

D. Offline Training and Online Detection
After preprocessing the historical data, executing the learning algorithm in the proposed IPS-OCSVM framework and getting the decision function, we complete the whole process of offline training. Online detection can be realized by inputting real-time data into the decision function. The QoE label of a previously unseen KQIs sample can be given by f (x) easily. When f (x) = −1, the classifier decides that this KQIs sample will result in a poor QoE. When f (x) = 1, the classifier decides that the KQIs sample corresponds to the normal QoE. Obviously, our proposed IPS-OCSVM framework is very simple and efficient in realizing real-time QoE anomaly detection. Fig. 5 shows the flow from offline training to online detection for IPS-OCSVM framework.

V. PERFORMANCE ANALYSIS
In this section, we will analyze the IPS-OCSVM performance from the following four aspects: i) analyze the influence of using different kernel functions on performance; ii) analyze the influence of different ν on performance; iii) analyze the influence of artificial experience fluctuation on performance; and iv) evaluate performance of our proposed IPS-OCSVM framework on real test set.

A. The Proposed IPS-OCSVM vs. Kernel Functions
To quantitatively show that the Gaussian kernel is the best, we compare it to Linear kernel, Poly kernel, and Sigmoid kernel. The definitions of Gaussian kernel, Linear kernel, Poly kernel, and Sigmoid kernel are respectively We compare the performance of IPS-OCSVM framework using different kernel functions in Fig. 6. The parameter γ for kernel functions is 1/14 for there are 14 indicators in a KQIs sample. The parameter r for Poly kernel and Sigmoid kernel is 0, d for Poly kernel is 3. We observe that the curves of Linear kernel and Poly kernel are both cluttered. Linear kernel, Poly kernel and Sigmoid kernel cannot identify KQIs samples leading to bad QoE well, and obvious errors have occurred. For example, a KQIs sample with 14 bad indicators (all indicators are bad) will lead to bad QoE under normal circumstances. In other words, as the number of bad indicators increases, the final predicted abnormal proportion should converge to 100%. But from the figure we can see that except for the curve of the Gaussian kernel function, the curves of the other kernel functions have some errors. In this issue, Gaussian kernel function is a reliable and suitable one. Therefore, we choose the Gaussian kernel function in further analysis.

B. Performance Evaluation Using Different ν
As we described in Section IV, the proposed IPS-OCSVM framework only needs to adjust the parameter ν for the adjustment of QoE anomaly detection ability. The parameter ν ∈ (0, 1] proves to be the upper limit of the proportion of training samples that are classified outside the decision boundary, and a lower bound on the fraction of supporting vectors. The adjustment of anomaly detection ability here means that if the network operator finds that the existing QoE anomaly detection result does not meet the actual needs, the request for determining the abnormal QoE can be tightened or relaxed. We selected five different ν values (ν ∈ {0.9, 0.5, 0.1, 0.05, 0.001}) to study the impact of different ν on making decision.
With increasing the parameter ν in (P2), the number of support vectors and the number of misclassified training samples both increase. The parameter ν is closely related to the proportion of training samples that are misclassified outside the decision boundary. In the KQIs-based QoE anomaly detection problem, increasing the parameter ν makes the classifier more sensitive to KQIs samples with some bad indicators. A KQIs sample is more likely to be classified as a sample leading to bad QoE in the case of larger ν. In Fig. 6(a), we can see that as ν grows, the classifier is increasingly inclined to believe that KQIs samples with bad indicators will lead to bad QoE. In actual use, the network operator can fine-tune the parameter ν for special requirements.

C. The Influence of KQIs Thresholds Fluctuation
In actual use, different network optimization experts may have slight differences in threshold settings under the same condition. For example, in Fig. 1, an expert believes that the threshold of Page Response Success Rate should be set to 82%, while other experts may think that Page Response Success Rate is greater than 80% to consider the index to be normal. Different thresholds mean different D 0 in Algorithm 1. We set up experiments to verify the impact of this KQIs thresholds fluctuation on IPS-OCSVM performance. We use the vector th to denote original KQIs thresholds. The upper bound of the new thresholds is 1.05th, and the lower bound is 0.95th, which means that 5% random fluctuation is allowed based on the original thresholds. As mentioned before, we choose the Gaussian kernel function  for analysis, and the parameter γ for Gaussian kernel is 1/14. We selected 5 different random results with obvious difference to show in Fig. 7.
As can be seen in Fig. 7, the thresholds fluctuations within 5% have limited impact on our method, and the differences are mainly concentrated in the D 1 and D 2 data sets. To investigate the effect of threshold fluctuations on the detection results, we use one-sample t-test, a statistical significant test, to investigate whether there was a significant difference between the multiple detection results with the fluctuating thresholds and the detection result using the original thresholds on D 1 and D 2 .
Significance testing needs four steps: i) state the null and alternative hypothesis; ii) calculate the test statistic; iii) find the p-value; and iv) compare p-value with α and decide whether the null hypothesis should be rejected or accepted. Taking the D 1 data set as an example, we test whether the detection results of the fluctuation thresholds and the detection result of the original thresholds (abnormal rate is 0.6184) are significantly different. Significance level here is α = 0.05.
Null hypothesis: Alternative hypothesis: Table I shows the result of one-sample t-test [47], [48] in abnormal rate of D 1 based on IBM SPSS Statistics 21, in which relevant parameters are listed in Table II. By comparing p-value= 0.298 with significance level α = 0.05, we accept the null hypothesis, i.e., there is no significant difference between the detection results with the fluctuating thresholds and the result using the original thresholds on D 1 , for p-value > α. Table III shows the result of one-sample t-test in abnormal rate of D 2 . Similarly, there is no significant difference between the detection results with the fluctuating thresholds and the result using the original thresholds on D 2 . So far, we have shown that IPS-OCSVM can withstand slight fluctuations in the thresholds, and slight fluctuations in the thresholds have no significant effect on the final detection result.

D. Performance Evaluation of the Proposed IPS-OCSVM Framework Based on Real Test Data
We test the performance of IPS-OCSVM on the actual KQIs samples with QoE tags. The methods we choose to compare are semi-supervised OCSVM and SDAE (both using D 0 as training set), unsupervised IF and LOF, traditional KQIs-based hard decision as for baseline. We use precision, recall, F 1 and AU C to evaluate the performance of the classifier. AU C is the area under the ROC curve, which can give an overall evaluation on the performance of different classifiers. The calculation formulas of precision, recall and F 1 are as follows: The classifier labels QoE of a KQIs sample as normal or abnormal (i.e., binary classification). For each predicted KQIs sample, there are only four possible outcomes, i.e., true positive (TP), true negative (TN), false positive (FP) and false negative (FN). They are respectively defined as below: TP is correctly predicted as abnormal when the instance originally labeled as abnormal. TN is correctly predicted as normal when the instance originally labeled as normal. FP is incorrectly predicted as abnormal when the instance originally labeled as normal. FN is incorrectly predicted as normal when the instance originally labeled as abnormal. Table IV shows the comparison results on the test set. By comparing the performance on the test set, we find that our proposed IPS-OCSVM framework sacrifices a little precision, but in return for a huge increase in recall, F 1 and AU C when compared with the baseline method. The performance of OCSVM and SDAE (both using D 0 as training set) is also inferior to that of IPS-OCSVM. Moreover, the complexity and stability of SDAE are not as good as IPS-OCSVM. IF and LOF are unsupervised methods, which do not combine expert knowledge. It can be seen that their performance is far less than the performance of IPS-OCSVM, OCSVM and SDAE. IPS-OCSVM, OCSVM and SDAE all utilize expert knowledge. Besides, IF and LOF are not as convenient as IPS-OCSVM in real-time detection.

VI. CONCLUSION
In this paper, we have proposed an IPS-OCSVM framework to realize KQIs-based QoE anomaly detection in cellular networks. The proposed IPS-OCSVM can make soft decision to realize adjustment of detection ability and use expert knowledge reasonably while withstanding its fluctuation to realize fast online detection after offline training. In future work, we will consider the temporal relationship between KQIs samples, which may make anomaly detection more effective.