Discrete Wavelet Transform for Generative Adversarial Network to Identify Drivers Using Gyroscope and Accelerometer Sensors

—Driver identiﬁcation is an important research area in intelligent transportation systems, with applications in commercial freight transport and usage-based insurance. One way to perform the identiﬁcation is to use smartphones as sensor devices. By extracting features from smartphone-embedded sensors, various machine learning methods can identify the driver. The identiﬁcation becomes particularly challenging when the number of drivers increases. In this situation, there is often not enough data for successful driver identiﬁcation. This paper uses a Generative Ad-versarial Network (GAN) for data augmentation to solve the problem of lacking data. Since GAN diversiﬁes the drivers’ data, it extends the applicability of the driver identiﬁcation. Although GANs are commonly used in image processing for image augmentation, their use for driving signal augmentation is novel. Our experiments prove their utility in generating driving signals emanating from the Discrete Wavelet Transform (DWT) on smartphones’ accelerometer and gyroscope signals. After collecting the augmented data, their histograms along the overlapped windows are fed to machine learning methods covered by a Stacked Generalization Method (SGM). The presented hybrid GAN-SGM approach identiﬁes drivers with 97% accuracy, 98% precision, 97% recall, and 97% F1-measure that outperforms standard machine learning methods that process features extracted by the statistical, spectral, and temporal approaches.


I. INTRODUCTION
A RTIFICIAL intelligence and data mining are two essential paradigms in developing future transportation systems. Thanks to the large amounts of traffic data collected from sensors in automobiles, telecommunications antennas, and smartphones, these paradigms have rapidly transformed the transportation sector. As an instance, a driver identification problem classifies the drivers based on location or behavior characteristics. It is applicable in such diverse areas as freight transportation, driver control, anti-theft systems, and usagebased insurance systems [1], [2].
Previously, drivers were identified using simple devices such as ID cards. Following this, more advanced methods such as fingerprint, face recognition, and finger vein pattern recognition have been applied. However, these methods often violate the driver's privacy or enable the drivers to cheat. Nowadays, the modern driver identification systems collect data from in-vehicle sensors [3], GPS [4], inertial sensors [5], or their combination. The related literature have used the following techniques to classify the drivers: • Statistical methods such as Gaussian Mixture Models [6], and Hidden Markov Models [7]. • Unsupervised learning methods such as Generative Adversarial Network (GAN) [1]. • Supervised learning methods based on the user-defined features, such as Linear Discriminant Analysis [8], and Extreme Learning Machines [9]. • Deep networks based on the hidden features such as Gated Recurrent Unit (GRU) [5]. • Ensemble methods, such as gradient tree boosting [10], random forest [11], extra trees [12], and Stacked Generalization Method (SGM) [13].
Since the CAN-bus is available in most modern cars, many researchers used their data for driver identification; see Table  I. However, some vehicles are still not compatible with this standard, and it is a motivation to use accelerometers and gyroscopes for driving identification. Smartphones and OBD-II devices support these sensors. Accelerometers measure a specific force, whereas gyroscopes measure angular velocity. [16] identified drivers by acceleration, angular velocity, and lane changing features extracted from GPS data. [15] used PCA on smartphone acceleration data to identify drivers, although its patterns were similar for some drivers. The histograms of acceleration data were also fed to a Multi-Layer Perceptron (MLP) by [17]. [5] transformed the acceleration signal and the angular velocity to a spectrogram and then processed them by Convolutional Neural Networks (CNN) and GRU. [7] used a Hidden Markov Method (HMM) on simulated driving data. Ensemble methods are also applicable to smartphone data. For example, [18] applied an ensemble of CNN, Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM) for driving identification. Table I presents details of these studies and their performances. Despite the broad literature on driver identification, some challenges have remained: • The performance of the applied machine learning models is sensitive to the examined scenario. • The proposed models are usually overfitted to the training data, and their performance is poor on new data. • As the number of drivers increases, the driver identification accuracy decreases. To solve these challenges: 1) We consider different trained machine learning methods in the SGM ensemble method to get their specialties in an integrated system. 2) By augmenting the driving data, we overcome the overfitting problem.
3) The original driving data and the generated data are used in the training process of SGM to improve the generalization performance. The proposed hybrid method of GAN and SGM contains the following contributions: • GAN has been widely used for image augmentation to improve image classification. Besides, [1] used the GAN's discriminator part to identify automobile theft. However, GAN usage to augment driving signal data is the novelty of this paper and controls the overfitting. This approach causes stability in the results, and our system's accuracy becomes independent of the examined drivers. • To train the GAN model, DWT of 6 signals of threedimensional accelerometer and gyroscope sensors data are fed to a specialized GAN to generate some augmented drivers' data. • In the feature extraction phase, we apply histograms of acceleration and gyroscope signals on overlapped win-dows, while [17] used only acceleration data. Although the GAN model works on DWT of driving data and yields promising results, the histogram feature overcomes DWT, spectral, temporal, and other statistical features for the driver identification phase. • The classification is done using the SGM on the extracted features. Likewise, [13] has used SGM on in-vehicle data and identified the drivers with 88% accuracy. Our SGM on smartphone data improves this accuracy up to 97%. In what follows, Section II presents the driver identification system. The experiments and sensitivity analysis results are given in Section III. The final section ends the paper with a brief conclusion.

II. PROPOSED SYSTEM
A novel architecture for driver identification is presented in Fig. 1. This architecture covers data pre-processing, data augmentation, feature extraction, and driver identification. In the following subsections, we present details of the individual modules.

A. Data pre-processing module
This module collects the acceleration, angular velocity, and magnetic field signals and produces segmented data by pursuing six steps: 1) The sensors' data should be aligned with vehicle axes as the cellphone may be located in various positions. We use the dataset [19] that aligned signals based on the magnetic field. 2) After reorientation, only acceleration and angular velocity are used for data cleaning. The outliers are clipped, and then the missing values are replaced by their average. 3) We then remove the motionless sampling instances, by testing t+6fs t a 2 x,t + a 2 y,t + a 2 z,t ≤ θ, where θ is 0.5 and the frequency is f s = 2 Hz. 4) The data is normalized using the Gaussian transformation x = x−x σ , wherex and σ denote average and standard deviation, respectively. 5) The data is split into segments using overlapped windows. The parameters of this step are window size and overlap percentage. We have GAN-training and SGM-training phases. For the GAN-training, the existing dataset is split into training and validation parts. When GAN learns the training data, we save it and use it for data generation. The combination of generated The architecture of the proposed hybrid model for driver identification consists of two separate stages. First, a GAN model is trained to data augmentation, and then a SGM is trained to identify drivers. data and the training data is guided to segmentation step. For SGM-training, the data is directly guided to the feature engineering module.
On the other hand, overfitting is a common problem in machine learning, creating significant challenges for various driver identification systems. One of the best solutions for overfitting is data augmentation [20]. Generative Adversarial Methods [21] can augment data to enhance the generalization performance. GANs indirectly learn the distribution of data p data from a sequence of samples x (1) , x (2) , ..., x (n) to produce new samples from the learned distribution with the same statistics as the training set.
GAN includes two neural networks that are trained concurrently. The first network is known as the discriminator (D). This network takes sample x as input and returns D(x), which is the probability that x is an actual sample. The second network is called the generator (G). This network synthesizes samples to make D believe they are actual samples. G takes some random input z (1) , z (2) , ..., z (n) , usually from a uniform or Gaussian distribution, and maps G(z) to the data space of the distribution p g . Thus, the aim of G is to ensure that p g = p data . GANs are trained by optimizing the following objective function as a minimax game [21]: where x ∼ p data and z ∼ p z indicate that the distributions of x and z are p data and p z . E is the mathematical expectation. The discriminator maximizes the logarithm of D(x) for x ∼ p data and minimizes for x ∼ p data . The generator fools D by creating samples that D(G(z)) ∼ p data . In other words, the generator maximizes D(G(z)), which is equivalent to minimizing 1 − D(G(z)). [21] showed that GAN generates new samples from a standard data distribution when its objective function converges to −log(4). Then the generated data are helpful to augment the original dataset. Likewise, we use GAN for data augmentation, not for driver classification. We get pre-processed data, extracts their wavelet features, and then train the features by GAN. Finally, the model is stored to generate driving data. Our experiments indicate more precise outcomes for short window size and great overlap percentage. We derived the best results for 15 minutes and 75% as the window size and overlap percentage. In what follows, we present the details of the wavelet transform on these windows.
where φ 0,k (n) and ψ j,k (n) are mother wavelet functions, W φ (0, k) and W ψ (j, k) are approximation and detail coefficients, where J = log 2 M, j = 0, 1, 2, ..., J − 1 and k = 0, 1, 2, ..., 2 j − 1. Besides, by Inverse Discrete Wavelet Transform (IDWT), one can directly construct the original signal from the approximation and detail series. Now, following [19], we use the Daubechies family of mother wavelet (DB2). For each window of accelerometer and gyroscope signals in 3-dimensions, we derive approximation and detail coefficients vectors W φ and W ψ . We have six vectors in each segment as we have six axes in the input, and because each segment is decomposed one level so that we have twelve vectors in each segment. These 12 vectors are used as the features of the overlapped windows for the GAN training phase; see Fig. 2 that visualizes this process.

Fig. 2:
The extracted features by DWT from accelerometer (green rectangles) and gyroscope (orange rectangles) signals for overlapped windows.
Both are implemented under conditional GAN [24]. Their accuracies for the driver identification system are compared when they augment the dataset. A GAN that leads to the highest accuracy is selected for the proposed system. The experiments showed that: • Augmentation did not improve driver identification accuracy when the GAN model contains too various layers or only dense layers. • Generated data by deep convolutional GAN, shown in Fig. 3, was significantly robust and advanced the accuracy. At least 2,000 iterations were needed to train this GAN model. To process the time-streaming signals, some windowing functions, including tumbling, hopping, sliding, session, and snapshot windows, can be used. In our GAN models, we used sliding windows for input signals and tumbling windows for output signals. We considered 75% overlap for sliding windows. Tumbling windows are also disjoint. The lengths of both windows were equal. Denote T W = 10 seconds as the length of tumbling windows. For generating w = 15 minutes of driving data, the GAN model generated n = w/T W = 90 tumbling windows. These tumbling windows were saved in a tensor with size [n, 12, T W/2 + 1] in DB2 wavelet space. T W/2 + 1 is the number of approximation and detail coefficients of each tumbling window. By IDWT and combining the approximation and detail coefficients, these tumbling windows reconstruct back into the original signal with size [n, 6, T W ] and then they were reshaped to [n * T W, 6] to meet the training data standard.

B. Feature engineering module
Different statistical, temporal and spectral features are extracted for each window data by a library given in [25]. These features are commonly applied in signal processing, such as speech processing, electrocardiogram analysis, and driver behavior evaluations. Previous driver identification researches also used some of these features. For example, [26] used statistical features, [17] extracted histograms from acceleration data, [9] used temporal features, and [11] utilized spectral features.
However, considering all features cannot necessarily improve the classification performance. The most relevant fea-tures to the labels with the least redundancy are more informative for shallow and deep learning models. Thus, a critical part of machine learning problems is feature selection. We used both filter and wrapper methods for feature selection. The filter method is independent of machine learning algorithms, and the features are sorted based on their scores. Here, we use Pearson's correlation coefficient to score the features and remove the unrelated ones whose correlations are more diminutive than 95%. Then, we use a wrapper method that selects different subsets of the remaining features from the filter method and trains the driver identification model using these features. The best accuracy of the model leads to the best set of features. We derived that histogram features of accelerometer and gyroscope data are the best features for driver identification.
We use mostly six windows in each extraction cycle and extract their features separately. They store together in a single row in a new dataset. A grid search on histogram features demonstrates that the optimal number of bins is 100. Also, reducing the histograms' scope to 95% shows the best robustness in the final model. Finally, we normalize the remaining features on their L 2 −norms.

C. Identification module
In driver identification by classification methods, every driver is considered one class, and the model tries to recognize this class based on the extracted features. Each model first learns training features, and then its performance is evaluated on the testing data. Here, a wide range of machine learning algorithms, including Decision Tree (DT), K-Nearest Neighbor

III. EXPERIMENTAL RESULTS
This section evaluates the proposed system on the data collected by [19] using different smartphones with 2 Hz sampling rate from 10 drivers. The average driving time for all drivers is 7 hours and 45 minutes. The data includes IMU and GPS data. [19] used accelerometer and magnetometer measurements for data reorientation. We use their results and only use the accelerometer and angular velocity measurements for the driver identification model to preserve privacy. To implement the algorithms, we used Scikit-learn version 0.22, Tensorflow version 1.13.2, and Keras version 2.3.0.

A. Sensitivity analysis on involved axis
The accelerometer and gyroscope sensors have three axes. We test all of their combinations to find the best driver identification results. Simply speaking, the x-axis represents the lateral axis and indicates transverse maneuvers such as changing lanes or turning. Similarly, the y-axis represents the longitudinal axis and suggests moving forward and backward maneuvers, such as acceleration or deceleration. The z-axis presents the vertical axis and shows road inclination and road obstacles, such as going over bridges or speed bumpers. Fig. 4 shows some of the best combinations of the involved axes for driver identification. This figure shows that the y-axis provides the most informative results. Also, it reveals that the use of both gyroscopes and accelerometers significantly increases accuracy in every situation. This result contradicts the results of [17], which recommended using only the accelerometer signal for 10 hours to identify a driver. Instead, we found that gyroscope and accelerometer data for 2 hours and 10 minutes per driver are sufficient to solve the same problem. Also, unlike [17] that suggested avoiding the z-axis, our results show that the highest performance achieved using data from all axes.  x-Ac.Gy. y-Ac.Gy. z-Ac.Gy. xy-Ac. xy-Gy. xz-Ac.Gy. xy-Ac.Gy. yz-Ac.Gy.

B. Sensitivity analysis on window properties
Finding suitable window size and overlap percentage is essential for driver identification problems. Fig. 5 shows the accuracy results on 10 drivers. According to this chart, the highest accuracy is achieved for 20-minute window size and 75% overlap. In this case, the system identifies drivers every 5 minutes. If one selects 15 minutes for the window size instead of 20 minutes, the highest accuracy is 92.9% for 25% overlap. In this case, a driver can be identified every 11 minutes and 15 seconds. With the same window size and 75% overlap, a driver can be identified with 91.5% accuracy every 3 minutes and 45 seconds. The improvement by extending the overlap indicates that there is often critical information at the windows' edges.

C. Analysis on features
As correlated variables gain no additional information, they are genuinely redundant. Besides, domains with large numbers of input variables suffer from the curse of dimensionality, and multivariate methods may overfit the data. In this research, two steps are used for feature selection. One is a fixed threshold based on correlation that removes redundant features whose correlations are less than 95%. In the second step, the remaining features are compared based on their classification accuracy. Fig. 6 shows accuracy results from the experiment on ten drivers with 15 minutes window size and 75% overlap. Based on t-SNE method [27], Fig. 7 shows different drivers with different colors. It confirms that the drivers are separable based on histograms, statistical features, and temporal features. Conversely, spectral features cannot distinguish drivers. Using all features also provides unsatisfactory results, and the model's variance is high. The driver identification accuracy for all features is 73.1%. The best feature is the histogram that leads to 91.5% accuracy.

D. Performance of identification algorithms
The most important part of driver identification is classification. In this part, we use the histogram features of all axes of the accelerometer and gyroscope. The window size and overlap  Fig. 8 illustrates the best results of different classification algorithms to recognize ten drivers. The best results are shown in a light color. Besides, Fig. 9 compares the best algorithms to recognize the different numbers of drivers. The system accuracy is 100% for 4 and 6 drivers, 95% for eight drivers, and 91.5% for ten drivers. The best non-ensemble algorithms are MLP, KNN, and SVM. Also, SGM and Voting have the best accuracy among ensemble methods, while the Bagging method does not perform well.

E. Performance of data augmentation
To investigate the effect of data augmentation in the driver identification problem, Fig. 10 illustrates the discriminator's results where the deep convolutional GAN generated driving data for ten drivers with window size w = 15 minutes. Based on the Subsection II-A.2, we used tumbling windows with T W = 10 seconds. The discriminator's accuracy converged to 40%, and its loss was less than 0.7. The generator loss also converged to 0.7. To verify these results, we first got the average on the n tumbling windows for each driver d and each axis s. Denote these averages withĀ d,s (t) andḠ d,s (t) for the actual and generated signals over time step t. Then, we use the following distance to measure the dissimilarity for each tumbling window: (2) The integral calculates the area between the mentioned curves. By comparing 20000 tumbling windows of driving data with the training data and testing data, the dissimilarities were 16.027 and 18.54, respectively. Thus, the quality of generated data is acceptable. Fig. 10 shows the results of dissimilarity through 1.6k iterations. As one can see, the dissimilarity converged when the number of iterations grows. Also, by using t-SNE projection, as shown in Fig. 11, the generated data effectively discriminate the different drivers presented with the different colors for 15 minutes of data augmentation. Their effects decrease when the augmentation size becomes 120 minutes. Thus, GAN's generated data produces diverse driving data and improves the driver identification methods' generalization.

F. Performance of Hybrid Model of GAN and SGM
Although data augmentation is an effective method to control overfitting, it is necessary to design its implementation manually. In [20], AutoAugment procedure has been proposed to search the best policy for data augmentation. Instead of this procedure, we compared the performance of the hybrid methods of GAN and most successful classifiers of Fig. 8, Fig. 11: The results of t-SNE projection on the training data together augmented driving data, when the size of augmentation data varies from 0 to 120 minutes (colors refer to the different drivers).   Fig. 12 shows the results of these hybrid models. This figure shows that the results of all classifiers improve for 15 minutes augmentation size. Table II compares two hybrid models based on the standard machine learning measures. As one can see, GAN improves driving identification results in both hybrid models in accuracy, precision, recall, and F1-measure. Moreover, the highest accuracy is 97%, achieved by the hybrid model of GAN and SGM. Now, we compare the proposed system with some driver identification systems stated in the related literature. For a fair comparison, the projects that use ECU or GPS data are left because of their differences in data modalities as they come from various standards. We focus on inertial sensors consuming the lowest computational burden, and in comparison with ECU sensors, their ubiquity is more comprehensive in terms of smartphone-embedded; also, GPS violates privacy. Fig. 13 compares the proposed system with three baseline driver identification models based on inertial data, including Virojboonkiate et al. [17], Sánchez et al. [5], and Li et al. [26] to study how the accuracy diminishes as the number of drivers increases. We have implemented these baseline models in 1 , 1 https://github.com/Ruhallah93/Driver-Identification according to their descriptions. The window size and overlap were 15 minutes and 75%. We compared the results of these methods on the dataset of [19] with ten drivers. As one can see, our system causes the most stability when the number of drivers grows. Really, the statistical pair of (average,standard deviation) of GAN+SGM is (99.17,1.16) that is better than (75.96,4.08) for [26], (44.57,14.27) for [17], and (15.65,4.95) for [5]. Thus, the accuracy of GAN+SGM is not sensitive to the number of drivers. Our results also are competitive with that of [28] that received to 99.99%, 99.7%, 99.6%, 99.5% accuracies for 5, 15, 35 and 50 drivers identification by ECU signal processing.  Virojboonkiate et al. [17] Sánchez et al. [5] Li et al. [26] IV. CONCLUSION In this paper, we proposed a hybrid model for driver identification, where Generative Adversarial Network (GAN) is used for data augmentation, and Stacked Generalization Method (SGM) is applied for the classification. GAN has been used for data augmentation in image processing context, while it is a novel for driving signals that process on the Discrete Wavelet Transform (DWT) of the accelerator and gyroscope data. SGM is also an ensemble method that combines successful classifiers in an integrated system with the logistic regression aggregating strategy. We divided the training data and the augmented data into some overlapped windows with 15 minutes length and 75% overlap. Then the histograms of these windows were fed to SGM. The accuracy, precision, recall, and F1-measure of the proposed hybrid model on ten drivers were 97%, 98%, 97%, and 97%, respectively. The experimental results showed that the sensors' longitudinal axis is the most informative part. Our system's results based on histogram features were better than the system based on the statistical, temporal, and spectral features. In the future, one can combine different feature extraction methods on the augmented dataset to improve the driver identification problem.