An Assessment of Intrusion Detection using Machine Learning on Traffic Statistical Data
Detecting Zero-Day intrusions has been the goal of Cybersecurity, especially intrusion detection for a long time. Machine learning is believed to be the promising methodology to solve that problem, numerous models have been proposed but a practical solution is still yet to come, mainly due to the limitation caused by the out-of-date open datasets available. In this paper, we propose an approach for Zero-Day intrusion detection based on machine learning, using flow-based statistical data generated by CICFlowMeter as training dataset. The machine learning classification model used is selected from eight most popular classification models, based on their cross validation results, in terms of precision, recall, F1 value, area under curve (AUC) and time overhead. Finally, the proposed system is tested on the testing dataset. To evaluate the feasibility and efficiency of tested models, the testing datasets are designed to contain novel types of intrusions (intrusions have not been trained during the training process). The normal data in the datasets are generated from real life traffic flows generated from daily use. Promising results have been received with the accuracy as high as almost 100%, false positive rate as low as nearly 0%, and with a reasonable time overhead. We argue that with the proper selected flow based statistical data, certain machine learning models such as MLP classifier, Quadratic discriminant analysis, K-Neighbor classifier have satisfying performance in detecting Zero-Day attacks.