An Assessment of Intrusion Detection using Machine Learning on Traffic
Statistical Data
Abstract
Detecting Zero-Day intrusions has been the goal of Cybersecurity,
especially intrusion detection for a long time. Machine learning is
believed to be the promising methodology to solve that problem, numerous
models have been proposed but a practical solution is still yet to come,
mainly due to the limitation caused by the out-of-date open datasets
available. In this paper, we propose an approach for Zero-Day intrusion
detection based on machine learning, using flow-based statistical data
generated by CICFlowMeter as training dataset. The machine learning
classification model used is selected from eight most popular
classification models, based on their cross validation results, in terms
of precision, recall, F1 value, area under curve (AUC) and time
overhead. Finally, the proposed system is tested on the testing dataset.
To evaluate the feasibility and efficiency of tested models, the testing
datasets are designed to contain novel types of intrusions (intrusions
have not been trained during the training process). The normal data in
the datasets are generated from real life traffic flows generated from
daily use. Promising results have been received with the accuracy as
high as almost 100%, false positive rate as low as nearly 0%, and with
a reasonable time overhead. We argue that with the proper selected flow
based statistical data, certain machine learning models such as MLP
classifier, Quadratic discriminant analysis, K-Neighbor classifier have
satisfying performance in detecting Zero-Day attacks.