DCNN-IDS : Deep Convolutional Neural Network based Intrusion Detection System

. In the present era, cyberspace is growing tremendously and the intrusion detection system (IDS) plays a key role in it to ensure information security. The IDS, which works in network and host level, should be capable of identifying various malicious attacks. The job of network-based IDS is to diﬀerentiate between normal and malicious traﬃc data and raise an alert in case of an attack. Apart from the traditional signature and anomaly-based approaches, many researchers have employed various deep learning (DL) techniques for detecting intrusion as DL models are capable of extracting salient features automatically from the input data. The application of deep convolutional neural network (DCNN), which is utilized quite often for solving research problems in image processing and vision ﬁelds, is not explored much for IDS. In this paper, a DCNN architecture for IDS which is trained on KDDCUP 99 data set is proposed. This work also shows that the DCNN-IDS model performs superior when compared with other existing works.


Introduction
Information Technology (IT) systems play a key role in handling several sensitive user data that are prone to several external and internal intruder attacks [1]. Every day, the attackers are coming up with new sophisticated attacks and the attacks against IT systems are growing as the internet grows. As a result, a novel, reliable and flexible IDS is necessary to handle the security threats like malware attacks which could compromise a network of systems that can be used by the attackers to perform various attacks using command and control servers. Though there are various other security systems like firewall, IDS plays a major role in defending the network from all kinds of cyberattacks. IDS is divided into two categories. The first one is network IDS (NIDS) which monitors the network traffic and raises alerts when it detects any kind of attack. The second one is host-based IDS (HIDS) which detects both internal and external intrusion and misuse by monitoring the system in which it is installed. It constantly records the user activities and alerts the designated authority in case of an attack. Both IDS are represented in Fig. 1. The job of NIDS is to monitor the network traffic and to identify whether the network traffic records as either malicious or normal (benign). Several machine learning (ML) and deep learning (DL) classifiers are widely employed for the detection of intrusion as it is a classification problem. DL models like autoencoders (AE), recurrent structures, deep neural network (DNN), etc are used for IDS by many researchers. The convolutional neural network (CNN) model is quite often utilized for solving research problems in fields like computer vision, image processing, etc due to its capability to extract location invariant features automatically. The application of CNN for IDS is not explored much. Therefore, in this paper, deep CNN (DCNN) is trained on the most popular benchmark data set called KDDCup 99 which has more than 8,00,000 data points. It is also shown that the DCNN-IDS gives superior outcomes when compared to previous works. Further, this paper is arranged as follows. Section 2 and 3 includes the related works and data set description. Section 4 and 5 describes the statistical measures and the proposed model respectively. Section 6 and 7 covers the results and conclusion.

Related Works
Several ML and DL based approaches have been applied for various problems like malware detection, homoglyph attack detection, etc in the field of cyber security including IDS [2][3][4][5][6]. [7] analyses several ML based approaches for intrusion detection for identifying various issues. Issues related to the detection of low-frequency attacks are discussed with possible solutions to improve the performance further. The disadvantage of ML based approach is that ML models operate on manual features extracted by the domain expert. Since DL models can extract relevant features automatically without human intervention, many researchers propose various DL based solution for IDS. Self-Taught learning based NIDS is proposed in [8], where a sparse autoencoder and softmax regression is used. The proposed model is trained on the NSLKDD data set and it achieves an accuracy around 79.10% for 5-class classification which is very close to the performance of existing models. Apart from this, 23-class and 2-class classification also achieved good performance. A recent studies [9,10] claim that the deep networks perform better than shallow networks for IDS as the deep network is capable of learning salient features by mapping the input through various layers. In [11], the performance of RNN based NIDS is studied. The model is trained on the NSL-KDD data set and both multi-class and binary classification are performed. The performance of RNN based IDS is far superior in both classification when compared to other traditional approaches and the author claims that RNN based IDS has strong modeling capabilities for IDS. Similarly in [12] and [13], various recurrent structures are proposed for IDS.
In [14], a new stacked non-symmetric deep autoencoder (NDAE) based NIDS is proposed. The model is trained on both KDDCUP and NSLKDD benchmark data sets and its performance is compared with DBN based model. It can be observed from the experimental analysis that the NDAE based approach improves the accuracy up to 5% with 98.8% training time reduction when compared to DBN based approach. In [15], the effectiveness of CNN and hybrid CNN recurrent structures are studied and it can be observed that CNN based model outperforms hybrid CNN-RNN models. In [16], the authors have claimed that analyzing the traffic features from the network as a time series improves the performance of IDS. They substantiate the claim by training long short-term memory (LSTM) models with KDDCUP data set with a full and minimal feature set for 1000 epochs and have obtained a maximum accuracy of 93.82%. In [17], a scalable DL framework is proposed for intrusion detection at both the network and host levels. various ML and DNN models are trained on data sets such as KDDCUP, NSLKDD, WSN-DS, UNSW-NB15, CICIDS 2017, ADFA-LD and ADFA-WD and their performance are compared. In this work, the effectiveness of the proposed model is evaluated using standard performance metrics and it is compared with other works such as [16] and [17].

Data set Description
The tcpdump data of the 1998 DARPA intrusion detection evaluation data set is pre-processed to build KDDCUP 99 data set. The feature extraction from tcpdump data is facilitated by the MADMAID data mining framework [11]. Table 1 represents the statistical information about the data set. This data set was built by capturing network traffic for ten weeks from thousands of UNIX systems and hundreds of users accessing those systems in the MIT Lincon laboratory. The data captured during the first 7 weeks were utilized for training purpose and the last 3 weeks data were utilized for testing purposes.
This data set has a total of 5 classes and 41 features. The first one is the normal class which denotes benign network traffic records. The second one is DoS. It is a kind of attack that works against resource availability. The third one is the probing attack. This class represents all attacks that are used by the attackers to obtain detailed information about the system and its security structures and configurations. This kind of attack is performed by the attacks initially in order to gain insights about the network so that they could perform many critical attacks later. The next one is R2L which denotes root to local attacks. This kind of attack is performed in order to acquire illegal remote access to any system in a network. The last one is U2R which is user to root attacks. It represents attacks that are using to gain root-level access to a system.

Statistical Measures
The proposed DCNN-IDS model is evaluated using some of the most commonly used metrics such as recall, precision, f1-score, and accuracy. The Error matrix gives an overall idea about the performance of the model and These metrics are computed using terms that can be found in the error matrix. The first one is True Positive (TP) which indicates the count of malicious traffic data points that are rightly considered as malicious by the model. The second one is False Positive (FP) which indicates the count of benign traffic data points that are wrongly considered as malicious by the model. Similarly, True Negative (TN) indicates the count of benign traffic data points that are rightly considered as benign by the model. False Negative (FN) is the final term that indicates the count of malicious traffic data points that are wrongly considered as benign by the model. Based on these four terms, we can define a number of metrics: -Accuracy: This term denotes the total count of right predictions (TP and TN) made by the model over total count of all predictions.
-Precision: This term denotes the count of right positive results over the amount of all positive results predicted by the model.
-Recall: This term points to the total count of right positive results over the total count of all samples that are relavent.
-F1-score: This term represents both recall and precision by taking subcontrary mean between them.

Proposed Model
The DCNN-IDS architecture is represented by the Fig. 2 The structure of the DCNN-IDS model is shown in Table 2. The proposed architecture is composed of the following sections -Pre-processing of network connection records: the symbolic data in the connection records are transformed into numeric and normalized the data using L2 normalization. -Feature generation: The optimal features are extracted using the proposed CNN model. The CNN model contains the convolution 1D layer which uses a one-dimensional filter that slides over the connection record in order to form a feature map. This feature map, in turn, is passed into a max-pooling layer which facilitates the dimensionality reduction. The batch normalization process is employed between the convolution and max-pooling layer to speeds up the training process and also for performance enhancement. Dropout is placed after the max-pooling layer which acts as a regularization term. Since CNN has parameters, the hyperparameter tuning approach is followed to identify the optimal parameters. The value 0.01 is assigned as the learning rate and adam optimizer is utilized. The number of filters is 32 in the initial CNN layer, 64 in the next CNN layer and 128 in the final CNN layer. The parameter max-pooling length is set to 2 in all the max-pooling layers and dropout to 0.01. When the number of CNN layers increased from 3 to 4, the performance decreased and hence 3 level CNN is used. Finally, two dense layers are included along with the CNN layer and the first dense layer composed of 512 neurons and the second one is composed of 128 neurons. These layers use ReLU as the activation function. -Classification: The classification is done using the fully connected layer which composed of 5 neurons with a softmax activation function.

Results
The proposed CNN model is designed and trained using one of the most commonly used python 3 library called Keras 3 with tensorflow 4 . The model performance is tested on the KDDCup 99 data set and the obtained results are  Table 3. The proposed CNN model outperforms than the existing LSTM [16] and DNN [17] based intrusion detection models.

Conclusion
In this paper, the effectiveness of the deep CNN model is studied for intrusion detection by modeling the network traffic data. The proposed 1D-CNN outperforms the other relevant approaches where models like DNN and LSTM are used. The proposed model uses only 425,989 parameters and does not incorporate any complicated prepossessing techniques. Therefore, it has the potential to be used in various low-powered IoT devices which has a very limited computation power.
In the future, hybrid models can be used where the features are extracted from hidden layers of DL models and fed into other ML or DL models for further improvement of performance.