Combined RF-based drone detection and classiﬁcation

—Despite several beneﬁcial applications, unfortunately, drones are also being used for illicit activities such as drug trafﬁcking, ﬁrearm smuggling or to impose threats to security-sensitive places like airports and nuclear power plants. The existing drone localization and neutralization technologies work on the assumption that the drone has already been detected and classiﬁed. Although we have observed a tremendous advancement in the sensor industry in this decade, there is no robust drone detection and classiﬁcation method proposed in the literature yet. This paper focuses on radio frequency (RF) based drone detection and classiﬁcation using the frequency signature of the transmitted signal. We have created a novel drone RF dataset using commercial drones and presented a detailed comparison between a two-stage and combined detection and classiﬁcation framework. The detection and classiﬁcation performance of both frameworks are presented for a single-signal and simultaneous multi-signal scenario. With detailed analysis, we show that You Only Look Once (YOLO) framework provides better detection performance compared to the Goodness-of-Fit (GoF) spectrum sensing for a simultaneous multi-signal scenario and good classiﬁcation performance comparable to Deep Residual Neural Network (DRNN) framework.


I. INTRODUCTION
T HERE has been a tremendous technological improvement in the drone industry. Drones are now being equipped with state-of-the-art (SoA) technologies and sensors such as GPS, LIDAR, radar and visual sensors. These technologies facilitate drones to support numerous applications like cinematography, farming, surveillance and recreational activities. Drones equipped with advanced technologies have great potential for damaged infrastructure inspections, urgent aid supply, search and rescue operations to remote and unreachable places. Apart from these beneficial applications, drones are also being used for illegal activities which impose risks to public safety. The illegal activities include but are not limited to violation of public privacy, drug trafficking, firearm smuggling, bombing, and invading security-sensitive places like airports and nuclear power plants.
Several Counter Unmanned Aircraft Systems (C-UAS) have been proposed to disable the attack from a drone, which are mainly divided into two categories: hard and soft interception (kinetic or non-kinetic solution). The kinetic solutions include intercepting a drone using (i) a trained bird of prey (ii) a net gun [1] (iii) a laser beam, and (iv) a firearm. The non-kinetic solutions include: (i) GPS spoofing [1] to deceive a drone's localization system and (ii) RF jamming. Irrespective of the chosen solution for any environment, the presence of a drone should be detected and classified beforehand.
Detecting and classifying a drone automatically is a challenging task. Some popular technological approaches to detect and classify a drone include: (i) Radar detection, (ii) Video detection, (iii) Acoustic detection, and (iv) RF-based detection. A comprehensive literature review on the current SoA Machine Learning-based drone detection and classification using these technologies is presented in [2]. Researchers also proposed to integrate multiple technologies [3] for the detection and classification of UAVs.
The radar detection exploits the back-scattered RF signal to detect and classify a drone. The conventional radar systems will fail to detect a mini-drone due to its small radar cross section (RCS). To overcome this problem, researchers utilized the micro-Doppler signature of a Quadcopter or a Multi-rotor UAV to detect and classify it using a multi-static radar [4] or a Frequency Modulated Continuous Wave (FMCW) radar [5,6]. A complete review of the detection and classification strengths of the current SoA FMCW radars is presented in [6].
The video/image detection includes both visual and thermal detection, and in [7]- [10] researchers proposed several drone detection methods using this technology. With this technique, drone detection is performed by analyzing its color, shape and edge information [7]. The detection method is reliable, however, it requires a line of sight (LOS) between the drone and camera and the performance is highly dependent on daylight conditions and weather conditions like dust, rain, fog and cloud. Furthermore, the resemblance of a bird to a drone makes it more challenging for a video detector. In [10], the authors utilized the motion and trajectory information of a drone to differentiate it from a bird. A brief overview of the frameworks capable of differentiating a drone from a bird is presented in [11]. The acoustic detection system utilizes the sound generated by flying drones to detect its presence using microphones. In [12], the authors proposed a framework using Hidden Markov Models (HMM) to perform phoneme analysis and identify a flying drone from its emitted sounds. Furthermore, the detection and tracking of a drone using an array of antennas are also proposed in the literature. A small tetrahedral array [13] or a microphone array consisting of 120 elements [14] are used for drone detection and tracking. The acoustic detection generally works well in a quiet or less noisy environment, however, the performance deteriorates if the environment is noisy such as urban or industrial areas or near seashores.
One of the most promising approaches to detect the presence of a drone is through RF sensing. The commercial drones perform RF communication with their ground control station (GCS) for flight control and navigation, live video transmission and transfer telemetry information. The autonomous drones also perform active RF communication to transfer live videos and telemetry messages. An RF drone detection system can detect a drone by monitoring the communication frequency spectrum. There are a few RF-based drone detection techniques proposed in the literature [15]- [20]. In [15], the presence of a drone is detected by monitoring how frequently the data packet is being transmitted at 2.4 GHz. Since most of the drones use different non-standardized protocols for their communication with their controller, the data packet transmission rate varies from WiFi and other Access Points (AP) [15]. In [16], the detection is performed by measuring the data packet length of a drone's communication link. These detection methods are inefficient since the detector can be easily spoofed by an application communicating with an AP with the same packet transfer rate or having the same packet length as a drone. In [17], a WiFi based drone surveillance method is proposed, where the identification is performed by a WiFi statistical fingerprinting technique. In [21], we observed that several commercial drone's GCS uses Frequency Hopping Spread Spectrum (FHSS) transmission as the radio control (RC) signal, which should also be accounted for in the identification method.
The detection and identification of a drone using the frequency signatures is presented in [18,19], using Deep Neural Network (DNN) based classifiers. In [18], the author developed a dataset using three commercial drones and used a simple feedforward DNN to detect and identify them. In [19], the author presented the detection, identification and classification on the same dataset using a Convolutional Neural Network (CNN). These studies were performed on a limited dataset and the impact of noise on the detection performance was not studied. Moreover, the detection performance in presence of multiple signals or interference was not investigated.
We presented an RF-based drone detection using GoF spectrum sensing and DoA estimation using MUSIC algorithm in [22]. Drone signal detection using wideband CFARbased energy detection and the feature extraction performance is presented in [21]. In [23], we presented drone signal classification using a DRNN framework. The classification was performed assuming a signal is already detected by a spectrum sensing algorithm. A complete solution for drone detection and classification based on RF fingerprints was not presented in our previous works, which we address in this paper. We propose two complete solutions for drone signal detection and classification, and provide an in-depth performance comparison. The detection and classification is performed in two different ways: (i) Two-stage detection and classification process: where the signal detection is initially performed using an efficient spectrum sensing method. This will detect all of the signals present in the spectrum. The detected signals are passed to a SoA classifier to provide a robust classification.
(ii) Combined detection and classification: where the signals are detected and classified simultaneously. For both proposed methods, we perform detection and classification with the received signal from a single receiver and by using frequency domain fingerprints. The advantages are (i) The possibility of using the received signal from a single receiver eliminates the requirement of calibration of multiple receivers. This makes both methods easily deployable with a low cost SDR and a computational unit. (ii) The frequency domain detection and classification provide the necessary information for a possible RF jammer. Both methods can perform fast detection and classification, even in the presence of overlapping signals both in the time and frequency domain. They are more generalized and robust as they understand the position and type of signal.
The main contributions of this paper are the following: 1) A novel and realistic multi-signal dataset is created using nine commercial drones and non-drone signals (i.e. WiFi communication signals). The dataset will be made public for future research.
2) The YOLO-lite architecture is recreated from scratch and modified to perform the combined drone signal detection and classification. The two-stage detection and classification is performed using GoF spectrum sensing and DRNN classifier.
3) The simultaneous multi-signal detection, spectrum localization and classification in the ISM band is presented in this paper. We are the first to propose a framework for simultaneous multi-signal drone detection and classification. 4) The detection and classification performance of both frameworks are evaluated on our dataset. Through detailed comparisons, we show that the YOLO-lite framework provides better detection performance compared to the GoF sensing and a good classification performance comparable to the DRNN classifier. The rest of the paper is organized as follows: A mathematical model of the received signal in an ISM band is presented in section II. Section III provides an overview of the SoA techniques for two-stage and combined signals detection and classification. The technical details to perform two stage, and combined detection and classification is presented in section IV. The dataset development and the experiment strategies are explained in section V. The performance analysis is presented in section VI and the concluding remarks are provided in section VII.

II. PROBLEM STATEMENT
The ISM bands are generally populated by several homogeneous and heterogeneous RF transmissions. The transmitters generally use spread spectrum technology to perform the communications. The FHSS transmissions are generally blind whereas the Direct Sequence Spread Spectrum (DSSS) transmissions are often cognitive in nature. Most commercial drones use DSSS signal for video signal transmission. Unlike FHSS transmissions, they perform sensing to find a free (or relatively free) channel before starting to transfer video signal. One example of such heterogeneous transmission at 2.4 GHz is shown in Fig 1. As it can be seen from the figure, four transmissions are occurring at the same time where three transmitters use the DSSS technology and one transmitter uses the FHSS technology.
The received signal from a drone can be expressed as: where, y k (t) is complex baseband transmitted signal, h k (t) is the time varying impulse response, k denotes the index of transmitted signal, K is the total number of available transmitters in the ISM band, n(t) is the Additive White Gaussian Noise (AWGN). With the help of discrete Fourier transform (DFT), the complex time domain receive signal can be converted into M t consecutive segments, each with length N f (= FFT size). The magnitude of the DFT matrix gives us the spectrogram matrix. This study aims to detect and classify drone and WiFi communication signals from the spectrogram matrix. The spectrogram representation provides more information compared to a Power Spectral Density (PSD) or IQ representation of the signal. It enables the classifier to determine useful RF signal features like frequency, bandwidth, dwell time and hop rate. Since commercial drones use pseudorandom number generators to generate the communication signals, the hopping pattern or the signal position will vary. The objective of our work is not to learn the hopping pattern, but rather to learn how the data/signal is distributed in the spectrum to detect and classify them. Deep Learning (DL) algorithms learn the signal distribution which is dependent upon factors like: (i) frequency, (ii) bandwidth, (iii) modulation, (iv) filtering parameters (v) device nonlinearities etc. We aim to utilize the DL algorithm to learn these factors from the spectrogram matrix.

A. Signal Detection
The conventional spectrum sensing methods can be classified into parametric and non-parametric methods. The parametric methods require prior knowledge of the transmitted signal for the detection, whereas the non-parametric methods also known as blind sensing do not require any knowledge of the signal. Most popular spectrum sensing methods are energy detection, eigenvalue detection, matched filter and cyclostationary feature detection [24]. The energy detection and eigenvalue detection are non-parametric methods. The energybased detection have been widely used for decades mainly due to its simplicity. The signal is detected if the measured energy is greater than the threshold corresponding to the specified false alarm rate. The eigenvalue detection method estimates the ratio of maximum and minimum eigenvalue and compares it with the threshold to determine if any signal is present. The cyclostationary and matched filtering are parametric methods, they require perfect knowledge of the transmitted signal and can work better at lower signal-to-noise ratio (SNR) compared to the energy-based detection [25]. The cyclostationary spectrum sensing method exploits the periodicity introduced in the transmitted signal. The matched filtering method detects a signal by correlating a known template (extracted from the transmit signal) with the received signal. Both methods are targeted towards specific (or known) signals and require a high computational cost.
For the drone signal detection within two-stage detection and classification, we choose wideband GoF based blind spectrum sensing algorithm [26]. The GoF sensing can provide better performance compared to the conventional energy detection using fewer samples of the received signal, at low SNR conditions and in presence of non-Gaussian noise [27]. The wideband GoF sensing uses DFT to divide the frequency band into small frequency bins and perform narrowband GoF sensing on each bin. In this paper, we have used Andersondarling test statistic for the GoF sensing [26].

B. Signal classification
DL methods have shown SoA performances in the classification of wireless signals and outperformed the conventional classification methods. Some remarkable works have been published [28,29] in the past few years regarding the classification of modulated signals and device fingerprinting using merely the raw received signals. In the recent years, CNN frameworks have been widely investigated for wireless signal recognition and classification problems [28,30]. Among different variants of CNNs, the residual network-based CNN [31] have shown great performance and outperformed other classifiers with equivalent network depth. In this paper, we adapt the DRNN proposed in [28] for the drone and WiFi signal classification.

C. Combined Detection and Classification
The DNN based visual object detection and classification techniques provide great tools such as YOLO [32] for the combined RF signal detection, frequency localization and classification. Signal detection/classification from a spectrogram image is analogous to the visual object detection and classification. A spectrogram image provides the time and frequency information of a spectrum instance, which can be utilized to perform the detection and classification. Since we are interested in wideband signals, the localization and bounding box will also enable to determine important features from the detected signal, such as center frequency, bandwidth, dwell time and hop interval. Such information can be used within a cognitive radio to perform dynamic spectrum access functionality and avoid collisions in a spectrum sharing environment. YOLO was first used in [33] to perform signal detection and frequency localization. In [34], WiFi and LTE signal detection, feature extraction and classification was performed using a pretrained YOLO framework. In this paper, we develop a YOLO framework from scratch to perform the combined drone RF signal detection and classification on the spectrogram image.
IV. TECHNICAL APPROACH A. Two stage detection and classification 1) Signal detection: The signal detection is performed using the Anderson-Darling (AD) GoF test [26]. The complex time-domain receive signal is converted into the frequency domain using an N-point DFT operation on K consecutive segments. This results in a sequence of X k of length K for every frequency bin. We perform a hypothesis test using an AD tester for each frequency bin to decide whether only noise or a signal is present in the bin. We assume, there is only noise present in the frequency bin if the normalized power spectral coefficient ( 2|X k | 2 N σ 2 ) follows a χ 2 distribution. The length of DFT is N and σ 2 is the noise power. We estimated the noise power by exploiting the histogram of the power spectrum [35].
The AD test statistic (A 2 n ) is calculated for each frequency bin as: Here, F o represents the Cumulative Distribution Function (CDF) of a chi-square distribution with 2 degrees of freedom, x 1 ≤ x 2 ≤ ... ≤ x n are the samples under test and n represents the total number of samples.
If A 2 n > λ, we assume the signal is present in the frequency bin, otherwise, we assume that the frequency bin only contains noise. Here, λ corresponds to the detection threshold. A detailed explanation of the AD GoF sensing, the derivation of A 2 n and the procedure to perform the hypothesis test on complex received signals from a Software Defined Radio (SDR) are provided in [26,27]. The value of λ is determined considering 5% False Alarm Rate (FAR). The suitable value for λ was calculated through the AD GoF tests on the drone dataset. The value of λ = 3.89 provided us an approximate 5% FAR.
2) Signal classification: The classification is performed using an adaptation of the DRNN framework proposed in [28]. The architecture is depicted in Table I and the building blocks are shown in Fig 2. The DRNN framework consists of N residual stack units, two fully connected (FC) layers and a softmax layer. For all convolution operations, we have used 32 filters with a kernel size of 3x3, apart from the first layer of the residual stack where the kernel size is 1x1. For the max pooling, a kernel size of 2x2 is used with a stride factor of 1. For each FC layer, we have employed a scaled exponential linear unit (SeLU) activation and mean response scaled initialization [36]. To prevent overfitting, we have performed 50% dropout after each FC layer. A softmax activation is used at the final layer to give the prediction probability. We haven't performed any batch normalization operation, since, we did not observe any additional improvement with it in the classification performance.
If any signal is detected from the spectrum sensing algorithm, the complete spectrum (i.e. the time domain RX signal) is passed to the classification stage. The time-domain signal is converted to a spectrogram signal of size 256 x 256 and the classification is performed on the signal. Since we are interested in comparing the classification performance with the YOLO framework, we kept the input size the same for both algorithms.

B. Combined detection and classification with YOLO
We implement one of the variants of the YOLO architecture to perform the simultaneous detection and classification of drone and WiFi communication signals. One of the biggest strengths of the YOLO framework is that it can detect the signal, determine spectral features like frequency, bandwidth, dwell time and predict the class of the detected signal simultaneously. The raw spectral power values of the RF signal in time and frequency domain are used as the input. Since recognizing such time-frequency domain spectral events are relatively simpler than the visual object recognition [33], a smaller network may be sufficient for the YOLO detection and classification task. In our experiments, we have adapted the YOLO-lite [37] architecture, which is a smaller and faster network and can be deployed on a non-GPU computer. Our adaptation of the YOLO-Lite architecture is shown in Table  II. We have used leaky-relu activation after each convolution operation (C1 -C6) and linear activation on the C7 layer.
The max-pooling operation is performed after the convolutions (C1 -C5). Finally, a fully connected layer is employed, and sigmoid activation is performed. A spectrogram dataset with a dimension of 256x256 is used as the input. The network produces an output grid containing the detection probability, bounding box coordinates and class probabilities as shown following: Here, S denotes the size of the grid. Each grid cell contains B bounding boxes, the confidence score of the detection: c, 2D coordinates: x, y, width: w, height: h of the object and the class probability P. We have used grid size 16, 2 bounding boxes and 10 different classes (Table III) for our tests. In the original YOLO-lite architecture output grid size of 8 was used, however, we found during our tests that in order to annotate the DSSS spectrograms (e.g. Tello, Parrot), a higher number of grid size is required. With the above specified parameters, the output dimension becomes 16 x 16 x 20. We have used adam optimizer [38] to optimize the training loss. The training loss involves the minimization of the sum of mean squared error loss between the ground truth and network prediction. The complete training loss function provided in [32] is used for the training optimization.  (Table III)   the dataset. The devices were placed seven meters apart from the receiver. Since a UAV controller generally uses a pseudorandom generator to generate the FHSS sequences as the RC signal, we included all possible hop sequences from each controller in our database. The controllers were turned off and on several times during data collection to inspect if the hop position changes and include that as well in our database.
2) Dataset development: To test the classification performance at lower SNRs, we introduced AWGN to the signal in the simulation environment. Generally, the SNR is calculated in the time domain by measuring the transmission power of the signal. Since the signal bandwidth of different drones is different, it becomes difficult to calculate the SNR in the time domain. Therefore, we calculated the signal SNR in the frequency domain as shown in Fig. 5a. Since the bandwidth and transmission power are different for different drones, the calculated SNR was also slightly different for different noise values as presented in Fig. 5b. To keep the performance analysis of different frameworks on the dataset simple, we considered the average SNR for the introduced AWGN values as shown in Fig. 5c.
In [23], we evaluated the classification performance in Rician and Rayleigh fading simulation environment where the classifier was trained with AWGN faded dataset. We did not observe any significant deviation in the classification performance due to the channel variation. Therefore, in this paper, we only evaluated the classification performance under AWGN conditions.
3) Implementation details: The GoF sensing was implemented in Matlab. The DRNN framework was implemented using Tensorflow-Keras and the YOLO-lite framework was implemented using TFlearn in Python, both running on top of Tensorflow [39]. The simulations and the neural network

A. Single signal detection and classification
Signal detection and spectrum localization with the YOLOlite framework are presented in Fig. 6. As Fig. 6 shows, the signals were detected, localized in the spectral domain and classified by the framework accurately. The average probability of detection (PD) of YOLO under different SNR conditions is presented in Fig. 7a. A detection from the YOLO-Lite prediction is considered to be true if it satisfies the following two conditions: (i) the confidence score of any bounding box is greater than the specified threshold (i.e. C > 0.4) and (ii) the Intersection over Union (IoU) is greater or equal to 0.50 (i.e. IoU ≥ 0.50). The PD for the GoF test is calculated by comparing the true frequency bins with the predicted frequency bins from the AD test. As Fig. 7a displays, the PD from YOLO is comparable with the GoF test. The detection probability increases for both frameworks as the SNR increases. The YOLO PD saturates around 96%, where it reaches at around -3 dB SNR. This saturation happens because YOLO often could not detect all closely spaced hops of signals from Tello, Parrot and WiFi. One example of such a spectrum is shown in Fig. 6c. The GoF test also provides around 96% PD around SNR -3 dB, however, it increases further and reaches 99.9% at SNR 3 dB.
To evaluate the classification performances, we used the F1-score parameter, which is a harmonic mean of precision and recall. Since the F1-score takes both precision and recall into account, it allows us to compare the performance of different classifiers using just one metric. The performance of classification with the YOLO and DRNN framework are plotted in Fig. 7b. In order to compare the classification performance of YOLO with the DRNN, we performed the  classification with the DRNN framework independent of the GoF detection. At lower SNR, DRNN provides a better F1score compared to YOLO. This is expected since YOLOlite is much shallower framework compared to the DRNN framework. The F1-scores increases with the increase in SNR for both frameworks. The F1-score reaches approx. 97% at SNR -3 dB for both frameworks. The F1-score from the YOLO-lite saturates around 97% at higher SNRs. The F1score from the DRNN framework increases to 99% at SNR 3 dB. The classification performance of our YOLO-Lite and DRNN framework is compared with other existing frameworks namely the DNN framework proposed in [18] and the Tiny-YOLOv2-VOC framework [37]. The F1-scores of the classification performance under AWGN conditions are plotted in Fig. 8. The DRNN framework provided the best classification performance compared to the other frameworks. The Tiny-YOLOv2 provided slightly better classification performance compared to the YOLO-Lite framework. This is expected since it is a slightly deeper framework compared to the YOLO-lite framework. We also recreated Tiny-YOLOv2 from scratch for this classification test. The DNN model provided lower F1score compared to other frameworks from SNR -10 dB and onwards. The F1-score of this framework was saturated at around 89%, whereas YOLO-lite, Tiny-YOLOv2 and DRNN provided around 97%, 98% and 99% F1-score respectively at higher SNRs.

B. Simultaneous multi-signal detection and classification
In order to test the detection and classification performance in a simultaneous multi-signal scenario, the signals were added in the simulation environment as shown in Fig 9. To ensure the number of specified signals present in the spectrogram, we performed spectrum sensing before adding the RX signals. Each signal burst was a vector of 256*256 = 65536 complex samples in the time domain. Later AWGN was introduced after adding the signals and it was converted to the frequency domain. The detection and classification from YOLO-lite is shown in Fig 10. In order to observe the prediction accuracy, the ground truth and prediction are plotted side by side and the true and predicted classes are also annotated in white color. As it can be seen from Fig 10,  The multi-signal detection from the GoF sensing is shown in Fig 11. The spectrogram image is plotted in Fig. 11 (top)  Fig. 11 (bottom). As it can be seen from the figure, that all seven signals are detected correctly.
The detection threshold for YOLO-lite was chosen to be 0.4. This threshold was chosen such that the maximum FA rate for the lowest SNR remains below 5%. The threshold was kept the same for single and simultaneous multi-signal detection.
The PD of the GoF sensing and YOLO-lite over different SNRs are plotted in Fig. 12. As it can be seen from the figure, YOLO-lite showed better detection performance compared to the GoF sensing. For GoF sensing, the detection performance over different SNRs did not vary depending on the number of sources. On the contrary, an increase in the detection performance was observed with YOLO-lite. If we look at Fig.  5b, the SNRs for different signals are different. With wideband GoF sensing, if we have multiple signals with different SNR present in the spectrum, it impacts the noise floor estimation. This may result in an overestimation of the threshold. This issue was not observed with the YOLO-lite spectrum sensing.
The classification performance of DRNN and YOLO over different SNRs are plotted in Fig. 13. Similar to the PD, the F1-score of the YOLO classification increases with the increase in the number of sources. We can also observe the same phenomena with the F1-score of the DRNN classification. After detailed investigation, we found that the classification accuracy of any individual signal remains the same for single signal or simultaneous multi-signal scenario. At lower SNRs, the classification accuracy for different signals are generally different. There are two reasons behind this: (i) the actual SNR of the signals are not the same (Fig. 5b) (ii) classifiers can classify some signals better than other signals at lower SNRs. Fig. 5b shows that the actual SNR of the signals is different for different classes. As it can be observed, one of the signals has a very low SNR compared to the other signals. This is because we added a constant AWGN noise and considered the average SNR for ease of analysis. When we calculate the average F1-score for the single-signal scenario, it averages equally having independent F1 scores for each class. However, for the multi-signal scenario, as the number of signals increases, the relatively lower F1 score of a particular class cannot make the average performance as worse as the single-signal scenario. Therefore, as the number of signals in the multi-signal scenario increases the average score also goes higher. Again, similar to the single drone scenario, DRNN showed better classification performance compared to YOLO. YOLO detection. Since YOLO is a supervised detection framework, the performance may deviate while detecting unknown signals. This issue can be resolved with transfer learning using a small labelled dataset of the new signal. On the contrary, the GoF sensing is a blind spectrum sensing method, it is able to detect any signal present in the spectrum. • Signal classification: The DRNN framework provided better classification performance compared to the YOLO framework. It is expected from a deep residual network since it utilizes the skip connection feature in the architecture and the network is deeper than the YOLO framework. • Signal localization and feature extraction: The signal localization and feature extraction is the best feature of the YOLO framework. The localization feature of YOLO enables us to detect multiple signals simultaneously and extract the useful features from the received signal. It can provide the center frequency, bandwidth, hop rate and dwell time of the detected signal. This information is required for an RF jammer to perform soft neutralization of a drone. With the two-stage detection and classification framework, it is not possible to extract all of these features. The DRNN framework does not provide the spectral position of the signal under classification. The GoF sensing provides the frequency and bandwidth of the signal, however, in presence of multiple signals, it is going to be difficult to associate the signal features with the classification label. • Complexity analysis: A complexity analysis was performed to give an overview of the computational complexity and the inference time required for each framework. The total network parameter (that is, trainable + non trainable parameter), the mean inference time on total test samples and the mean prediction time per sample are presented in  Since the classification is performed in a supervised manner, the classifier may not be able to classify or provide a label to the transmitted signals from newer drones. There are two possible outcomes in such a case: (i) classifier will label it as an existing drone signal if the TX signal has a similar frequency fingerprint, (ii) classifier will be confused and provide a very low classification score for all classes. Similarly, some specific models of UAV controllers may use a completely different hopping sequence compared to another controller under the same model. The YOLO detection and classification performance for such cases are not yet tested. We are going to investigate and address these issues in our future work.

VII. CONCLUSION
In this paper, we performed drone signal detection, spectrum localization and classification using two stages and combined detection and classification methods. Under the two-stage technique, we used the GoF sensing for the detection and the DRNN framework for the classification. The YOLOlite framework was recreated from scratch to perform the combined drone RF signal detection, spectrum localization and classification. A detailed performance comparison between both of the techniques is presented using a novel drone dataset that was prepared for this study. We obtained good detection and classification performances with both techniques. Since the classification is performed in a supervised manner, the performance may deviate in presence of unknown or newer drone signals which we mentioned in detail in the limitation discussion. In the future work, we are going to investigate the unsupervised scenarios, since we are interested in developing a robust framework that can detect and classify all drone signals irrespective of the dataset it is trained with.