Parametric Convolutional Neural Network for Radar-based Human Activity Classiﬁcation Using Raw ADC Data

—Radar sensors offer a promising and effective sensing modality for human activity classiﬁcation. Human activity classiﬁcation enables several smart homes applications for energy saving, human-machine interface for gesture controlled appliances and elderly fall-motion recognition. Present radar-based activity recognition system exploit micro-Doppler signature by generating Doppler spectrograms or video of range-Doppler images (RDIs), followed by deep neural network or machine learning for classiﬁcation. Although, deep convolutional neural networks (DCNN) have been shown to implicitly learn features from raw sensor data in other ﬁelds, such as camera and speech, yet for the case of radar DCNN preprocessing followed by feature image generation, such as video of RDI or Doppler spectrogram, is required to develop a scalable and robust classiﬁcation or regression application. In this paper, we propose a parametric convolutional neural network that mimics the radar preprocessing across fast-time and slow-time radar data through 2D sinc ﬁlter or 2D wavelet ﬁlter kernels to extract features for classiﬁcation of various human activities. It is demonstrated that our proposed solution shows improved results compared to equivalent state-of-art DCNN solutions that rely on Doppler spectrogram or video of RDIs as feature images.


I. INTRODUCTION
P Eople sensing and activity classification have increasing application potential in various areas, such as physical security, defense and surveillance.In industrial and consumer space, human activity recognition finds applications in smart homes, human-machine interfaces and elderly fall-motion monitoring systems.Knowledge of the performed activity in a room can enable smart control of the energy consumption, such as HVAC and lighting [1]- [3].Furthermore, knowledge of the performed human activity facilitates true ubiquitous smart home solution by discerning the user's intent.
Most of the human activity recognition systems are based on cameras and computer vision approaches.These systems have the advantage that they are quite easy to implement and benefit from several years of research, as a result the accuracy of such systems is quite high.However, camera systems suffer from lack of privacy and are sensitive to illumination conditions, thus they are not a favorable choice for smart home solutions.On the other hand, radar sensors have shown to be an effective sensing modality for human activity classification [4]- [13].The radar sensors offer privacy-preserving, illuminationinvariance properties and are capable of being aesthetically concealed in the operating environment.Recent innovation in the semiconductor technologies have facilitated integration and antenna-in-package solutions making the radar sensor into a small form factor [14].Human activity recognition also enables sensing and recognition of elderly fall-motion, which is a leading cause of death in elderly population and in some cases if medical assistance are not provided immediately can lead to major restriction to overall mobility of the individual [15]- [19].Thus, automatic sensing of fall-motion in particular among other activities have major social implications.
Radar sensors can sense and recognize human activities utilizing micro-Doppler signatures [20] that are generated by non-rigid-body motions of moving targets.Unique micro-Doppler signatures have been shown to be key features for object classification such as distinguishing among humans, animals and vehicles.The micro-Doppler signatures have been captured using hand-crafted features, a radar Doppler spectrogram or a video of range-Doppler images (RDI) and shown capable in recognizing human activities such as walking, running, crawling, standing, sitting, etc. as well as recognizing human gaits.In [4], authors propose to use six hand-crafted features such as torso Doppler frequency, total bandwidth (BW), offset of the total Doppler, BW without micro-Dopplers, the normalized standard deviation (STD) of the signal strength, and the period of the limb motion to classify seven different human activities, such as running, walking, walking while holding a stick, crawling, boxing while moving forward, boxing while standing in place, and sitting still.In [13], authors have used various deep convolutional neural network architectures to learn from range spectrogram, Doppler spectrogram and video of range-Doppler images for different activity classification.In [21], authors propose a continuous activity monitoring solution by augmenting the tracker's state by the classifier's activity classes.In [18], authors propose a novel deep auto-encoder based solution to sense elderly fall-motion from Doppler spectrogram.In [19], authors propose a deep deformable convolutional network to learn subject's aspect invariance features from radar Doppler spectrogram.
However, almost all deep learning solutions for human activity classification proposed in literature are based on learning features from radar Doppler spectrogram or video of radar range-Doppler images (RDI), but learning directly from radar raw ADC data was generally not feasible for a scalable solution so far.On the contrary, deep convolution neural networks (DCNN) applied in other domains such as computer vision and speech processing have been shown to be capable of implicitly learning commendable features from the raw input data without need of explicit features or preprocessing.So far DCNNs that are capable of learning directly from raw radar ADC data have not found much success in terms of scalable solution.Inspired from 1D SincNet in speech processing, in this paper we propose two DCNN architectures based on 2Dsinc filters and 2D wavelet filters that directly learn from radar raw ADC data to classify among different human activities using Infineon's 60-GHz frequency modulated continuous wave (FMCW) radar chipset BGT60TR13C.We demonstrate that the proposed trained DCNN architectures are able to achieve classification accuracy equal or better than equivalent DCNNs with Doppler spectrogram or video of RDIs as input.Thus, for the first time the true potential and advantages of DCNNs in radar processing and classification tasks have been demonstrated that do not rely on any preprocessing and can exploit the raw radar data as it has been demonstrated for data from other domains much earlier.
The rest of the paper is organized as follows, we present the

II. RADAR SYSTEM DESIGN
The work in this paper is based on Infineon's BGT60TR13C FMCW radar chipset.Its operating frequency ranges from 57 GHz to 64 GHz with an adjustable chirp duration.Its block diagram is shown in Fig. 1.The transmit path consists of a voltage controlled oscillator (VCO) that is regulated by a phase locked loop (PLL) to a reference frequency of f ref = 80 MHz.Highly linear frequency chirps between 57 GHz and 64 GHz are produced by adjusting the divider value and an additional tuning voltage ranging from 1 V to 4.5 V.In the receive path the echo returning from the target object is down-converted with a replica of the transmitted frequency chirp.Therewith the baseband frequency spectrum can be sampled by the 12 bit analog-digital converter (ADC).Moreover the receive path contains an intermediate frequency buffer amplifier and an analog IF-filter that can be adjusted corresponding to the received frequency range.The radar chip is package in an embedded waver level ball grid array package including 4 integrated patch antennas realized by a metal redistribution layer.Three of them are receive antennas having an antenna gain of 10 dBi and one is the transmit antenna with a gain of 6 dBi.Consequently, the radar sensor contains three identical receive paths and one transmit path.The radio frequency (RF) signal is distributed by an active RF distribution network to the receive paths.
The transmitted up-chirp from the FMCW radar's ramp generator is reflected by a moving object and is received at the receiver after round trip delay caused by the target's range from the radar and the velocity of the target.The received signal is mixed at the receiver with the transmitted signal and the resultant signal is low-pass filtered, thus performing the matched filtering operation.The phase of the resultant intermediate frequency or IF signal due to single point target can be expressed as where f min is the ramp start frequency, B and T c denote the chirp bandwidth and the chirp time respectively, and is the round trip propagation delay between the transmitted and received signal after reflection from the point target with range x and radial velocity components v i .The Doppler frequency relates with radial velocity v i as ν i = 2v i /λ = 2vf min /c, the macro-Doppler is represented by the centroid Doppler of the Doppler components due to the target ν c = I i=1 ν i , while micro-Doppler components are represented as {ν i − ν c } I i=1 .For our demonstrator, we configured the chip to transmit a single chirp per physical frame, where the frames are spaced T PRT = 1 ms. Figure 2 presents the frame configuration for seamless activity detection and classification.The rationale behind choosing this chirp configuration is mainly scalability.The user can chose variable number of frames to build the slow-time data resulting in configurable Doppler resolution.The pulse repetition time, therefore, is 1 ms resulting in unambiguous maximum velocity of v max = 1.25 m s −1 which is sufficient in most indoor activity sensing applications.Furthermore, the chosen configuration enables the option to build continuous Doppler spectrogram, which can be fed into a classifier through sliding window.The bandwidth is set to B =1 GHz and the up-chirp time is set to T c =64 µs accounting for a range resolution of 15 cm.The maximum detectable unambiguous range is 9.6 m.The set system parameters are provided in Tab.I.

III. CONVENTIONAL PIPELINE & CONTRIBUTIONS
The conventional signal preprocessing involves 1D moving target indication (MTI) filtering to remove the response from tatic targets and also Tx-Rx leakage, which effects the first few range bins.The reflections from stationary objects such as chair, tables and wall, etc. can overwhelm the reflections from the other moving targets limiting their visibility on the RDI or Doppler spectrogram.Thus, MTI filter is used to suppress the contribution of these stationary objects and leakage.Among several MTI filters, a simple 1D MTI filter subtracts the mean along the fast-time to remove the Tx-Rx leakage that perturbs the first range bins, followed by mean subtraction along the slow-time to remove the reflections due to static or zero-Doppler targets.
The range information of the target is extracted by performing the first FFT after applying 1D windowing along fast-time, which is the intra-chirp time.The Doppler information of the target is extracted by monitoring the change of target peak along slow-time, which is the inter-chirp time.One common approach is applying FFT along the fast time as well as slow-time dimension.The outcome of this operation is a twodimensional matrix representing the received power spectrum over range and velocity, also known as range-Doppler image (RDI).The received and deramped IF data is stored in matrices of size N c × N s , where N c being the number of chirps considered in a frame and N s is the number of transmit samples per chirp.
In a conventional processing pipeline, the above preprocessing is followed by feature image generation, such as Doppler spectrogram and video of RDI, which are then input to deep convolutional neural network (DCNN) or long-short term memory (LSTM) for classification.

A. RDI video
The video of range-Doppler images computed at frame time k can be obtained by applying a 2D STFT on the mean removed ADC data and is expressed as where w(m, n) is the 2D weighting function along the fasttime and slow-time, s(m, n, k) is the mean removed ADC data on k th frame.The index n, m sweep along the fast-time and slow-time axis respectively, while l, p sweep along the range and Doppler axeses respectively.N st and N ft are the FFT size along the slow-time and fast-time respectively.Figure 3 presents the video of RDI for walking and working activity at time frame 0, 15 and 30.( where l is sweeps along the range axis, N ft is the number of range bins and p, l, k are the Doppler, range and frame indices respectively.The Doppler spectrum at frame k contains both the macro-Doppler components as well as micro-Doppler components due to hand and leg movements while performing an activity.The stacked Doppler spectrum across consecutive frames is referred as Doppler spectrogram that captures information about the instantaneous Doppler spectral content and the variation of the Doppler spectral content over time.Figure 4 presents the Doppler spectrogram of the different activities, namely empty room, walking, standing idle, arm movement, waving and working on laptop.Figure 5(a) presents the conventional pipeline that involves explicit preprocessing and feature generation followed by a neural network such as DCNN or LSTM for classification.The novel aspect of the proposed architecture implicitly performs the preprocessing and feature generation in the neural network itself.Thus the input to the neural network is the raw radar ADC data directly as depicted in Fig. 5(b).The initial layer of the proposed DCNN learns 2D Sinc filter kernels or 2D wavelet filter kernels, which is representative of the preprocessing and feature extraction.The capability of DCNN to directly operate on the raw ADC data helps in reducing the computation complexity dramatically as well in practical implementation eliminates the need for digital signal processor (DSP) for preprocessing.

IV. PROPOSED APPROACH
Different activities can be distinguished by analyzing their unique range-velocity profiles.Some activities may have a very different profile such as walking and standing idle, but some may have only slight differences.For example working on the laptop and sitting idle on a chair only differ from slight hand movements to control the laptop.Thus, a higher resolution on specific frequency bands is required in order to accurately distinguish these actions.However, when applying a 2D STFT the whole observable range-velocity space is discretized in equal bins.Besides applying a STFT, time domain bandpass filters can be used to analyze the frequency composition of a signal.Time domain bandpass filters yield the ability to adjust the cutoff frequencies according to the needs of the application and can therefore be learned within a neural network.

A. 2D Sinc Filters
The benefit of integrating preprocessing into the neural network itself by learning the hyperparameters of a set of time domain filters was already shown in [22] for 1D audio signal processing.In the paper constraining the first convolutional layer to the use of 1D bandpass filters defined as the difference of two lowpass sinc filters with different cutoff frequencies is proposed.While training the lower and higher cutoff frequencies are optimized for the needs of the application.Replacing classical preprocessing by this layer shows improved results in speaker recognition.
In this paper the extension from 1D audio signals to 2D radar signals is proposed.The 1D sinc filter is defined as where K is the filter length, f s the sampling frequency of the signal, f l the lower cutoff frequency, b the bandwidth and k the filter parameter index.The hyperparameters of this filter are the lower cutoff frequency f l and the bandwidth b that implicitly defines the higher cutoff frequency.By defining a lower cutoff frequency and bandwidth in slow-time as well as in fast-time direction, a 2D bandpass filter that is able to extract joint range and velocity features can be created.The 2D sinc filter is defined as where

B. 2D Morlet Wavelets
The range-velocity profile is not only defined by its composed frequencies but also by the change of frequencies over time.When transforming a signal to frequency domain the time information is lost.This can be overcome by windowing the time domain signal.However, smaller window sizes mean higher time resolution but to the cost of worse frequency resolution and vice versa.Especially for time varying signals wavelets have several advantages over fourier transformations as they provide a time and frequency resolution [23].Due to the fact that radar signals are highly time varying, the usage of a 2D wavelet transformation using Morlet wavelets is proposed.The 2D Morlet wavelet is defined as where N and M are the filter-lengths, σ st and σ ft the standard deviations, f st c and f ft c the center frequencies, f st s and f f t s the sampling frequencies respectively in slow-time and fasttime direction.The hyperparameters that can be optimized by the neural network are the center frequency and the standard deviation of the wavelet.Similar to the previous introduced 2D sinc filters the frequency area of interest can be adjusted by the center frequency.But, additionally also the time-frequency resolution can be optimized by changing the standard deviation of the gaussian part of the wavelet.Due to the fact that the defined wavelet is the product of a cosinus and a gaussian window function also the frequency response has the shape of a gaussian.That means that it has no clear cutoff frequencies as it can be seen in Fig. 7, where an exemplary 2D Morlet wavelet in time and frequency domain is depicted.The standard deviations of the gaussian in time domain and in frequency domain are indirect proportional.Consequently, decreasing the width of the gaussian in time domain will lead to an increased width of the frequency response, which in turn shows the timefrequency resolution trade-off.

V. ARCHITECTURE AND LEARNING
In this paper two state of the art DCNNs evaluating the preprocessed data, a state of the art DCNN evaluating raw input data and two novel DCNN architectures using raw ADC data as input are proposed.All architectures have several characteristics in common.First, all networks finishing with a softmax classifier layer of size 6 as six different actions should be classified.Second, categorical crossentropy is used as loss function.Third, RMSprop optimizer is used with a learning rate lr = 0.0001, ρ = 0.9, = 10 −8 and batches of size 128.Fourth, all unconstrained convolutional and dense layers are initialized using the 'Glorot' initialization scheme.Fifths, the common convolutional and dense layers are using a rectifier linear unit as activation.And sixths, after every common convolutional and dense layer a dropout with a rate of 0.2 is implemented in order to prevent over fitting.The 2D SincNet uses 2D sinc filter convolutions in the first convolutional layer as described in chapter IV-A.As parameter this layer takes the filter lengths, the number of filters, the sampling frequencies, the padding mode and the stride for the slow-as well as fast-time direction respectively.Although there are no separated filters for slow-and fast-time, it is required to explicitly provide the number of filters in slow-time N st and the number of filters in fast-time N ft .According to this, 2D sinc filters are generated in a way that they form an equal grid of size N st × N ft covering the complete observable range Doppler domain.The trainable weights in this layer are the lower cutoff frequencies and the bandwidths in slow-as well as fast-time direction.In order to guarantee equal training in both filter dimensions the bandwidths and cutoff frequencies are normalized.
The 2D sinc filter layer is followed by a MaxPool layer with a pooling size of 8x2.Afterwards a common two dimensional convolutional layers using 50 filters of size 3x3 is implemented.As already mentioned above, the convolutional layer follows a dropout layer with a rate of 0.2.Moreover, after the dropout a max pooling of size of 4x2 is applied to decrease dimensionality.Then the tensor is flattened and fed into a dense layer of size 32 followed by the softmax classifier layer.The proposed network is depicted in Fig. 8 B

. 2D WaveConvNet
The 2D WaveConvNet (WCN) is designed similar to the 2D SincNet.Only the first convolutional layer is initialized by 2D Morlet wavelets as described in IV-B instead of using 2D sinc filters.Required parameter for this layer are the filter lengths, number of filters, sampling frequencies, padding mode and stride for the slow as well as fast time direction respectively.Similar to the 2D sinc filters the number of filters in slow time direction N st as well as the number of filters in fast time direction N ft have to be explicitly provided in order to distribute the frequency response of the wavelets equally as a grid in the 2D frequency domain.Both time axis were normalized as already discussed in the previous section.As a result, the standard deviations is chosen to be 0.06 in both filter dimensions.In a total N st times N ft 2D wavelets are created.Trainable weights of this layer are the center frequencies and standard deviations in slow-as well as in fast-time dimension.Also in the 2D WCN the learnable weights are normalized.

C. State-of-art Networks
DSNet: The DSNet is a typical state-of-art 2D DCNN architecture followed by a dense and softmax layer.It uses already preprocessed 2D Doppler spectrograms as input.Furthermore, it contains three common 2D convolutions with 4, 8 and 16 filters respectively.Each filter uses a kernel size of 3x3.After each convolution a dropout layer with dropout rate of 0.2 is used.Moreover, after the first two convolutional layers a MaxPooling of size 2x2 is used.Afterwards the tensor is flattened and fed into a dense layer of size 64 followed by the softmax classifier.The DSNet architecture is shown in Fig. 9  (a).
RDINet: The RDINet has a three dimensional stream of RDI images as input.Therefore three 3D convolutional layers are used to extract information from the video of RDIs.They all have a kernel size of 3x3x3 and use 4, 8 and 16 channels respectively.Also here a dropout layer with a rate of 0.2 is added after each convolutional layer to prevent overfitting.
After the first two dropout layers a maxpooling of size 2x2x4 is performed.Afterwards the tensor is flattened and further processed by a dense layer of size 64 before it is classified by the final softmax layer.The RDINet is sketched in Fig. 9 (b).
2D ConvNet: The 2D ConvNet uses the same architecture as the 2D SincNet and 2D WCN.Only the first layer is substituted by a unconstrained 2D convolutional layer with 'Glorot' weight initialization.No predefined time domain filters are used.Therefore each filter parameter can be learned individually.To evaluate the approach presented in this paper, a dataset was recorded in a real world environment.The radar was mounted on a tripod at a height of 1.20m and was placed in the corner of the room.The room has about 20m 2 with a table and chairs inside.The experimental setup is shown in Fig. 10.The dataset contains five different human activities plus additionally a recording of an empty room.To record the class "walking" a single human was allowed to randomly walk around.The class "idle" is split up in two recordings.First, a person was standing in front of the radar and in the second recording the person was sitting at the table facing towards the radar.As third activity random arm movements while standing were recorded.This class is called "arm movements".In order to record the fourth class called "waving" a person was waving with its hand at different positions in the room facing towards the radar.As last class working at the laptop while sitting at the table was recorded.Each activity was performed by the same person and recorded for about 18 minutes in total.Samples containing 2048 chirps with an overlap of 512 chirps are cut out from the recordings.Given the fact that the chirp repetition time is 1ms each sample captures 2.048s.For each sample a Doppler spectrogram and a video of RDIs as described in sec.III-B and III-A is created.Therefore a dataset with raw ADC data, Doppler spectrograms and videos of RDIs based on exactly the same chirps per sample is obtained.For each activity about 700 samples are available.Due to slightly different recording times for each activity, the number of samples per class varies.In table II the exact number of samples per class is stated.

B. Evaluation Details
To evaluate the proposed approach, a filter length of 65 in slow-and 33 in fast-time dimension was chosen for the 2D sinc filters.The same filter size is used for the constrained convolutional layer that substitutes the 2D sinc filter layer in the 2D ConvNet.For the 2D wavelet convolution the filter length in slow time direction was doubled in order to allow the gaussian window function of the Morlet wavelet to expand.Moreover "valid" padding is used in both dimensions to obtain the same output shape after 2D sinc and 2D wavelet layer.This provides the possibility of substituting both layers one by one while keeping the remaining network the same.To reduce computational intensity a stride of 4 and 8 is used in slow-and in fast-time dimension respectively.As the filter sizes are specified, the final size of the networks can be stated.The composition of parameters per layer is shown in tab.III for the DSNet and the RDINet and in tab.IV for the 2D SincNet, 2D WCN and 2D ConvNet.Given the fact that when evaluating Doppler spectrogram only velocity information has to be evaluated the network can be designed accordingly smaller.The RDINet uses a video of RDIs as input data.Thus, 3D convolutions have to be used which result in a higher number of parameters.Moreover, the first layer of the 2D SincNet as well as of the 2D WCN have significantly less parameters as the corresponding unconstrained convolutional layer of the 2D ConvNet.As a result the network size is reduced by more than 50 %.

C. Confusion Matrix Classification
For evaluation a 5-fold cross validation is performed.The 2D SincNet, the 2D WCN, the 2D ConvNet and the RDINet are trained for 20 epochs.The DSNet is trained for 100 epochs to allow convergence.Afterwards the accuracy as well as the F1 score are evaluated using the testing part.The obtained accuracies and F1 scores are averaged over all runs.For the accuracy additionally the standard deviation is calculated.The results are shown in tab.V.The state-of-art approaches achieve an accuracy of about 91 %, 94 % and 95 % respectively, whereas the proposed architectures show an improved accuracy of 98.9 % and 99.5 %.In order to analyze the classification results in more detail the confusion matrices of RDINet representing the state-of-art approaches and the confusion matrix of 2D WCN representing the novel architectures are shown in Fig. 11.The limitation of approaches is unveiled by looking at the individual accuracy per class.While most actions are similar well classified by RDINet and 2D WCN, a big uncertainty between the class "idle" and "working" exists.However, the proposed filter learning based approaches do not show this limitation and therefore achieves better accuracy scores.

D. Learned filters
Before training is started the sinc filters as well as the wavelets are initialized as described in sec.V-A and sec.V-B.Moreover, the first unconstrained convolutional layer of the 2D ConvNet is initialized using 'Glorot' scheme.For each approach, cumulating the initial filters leads to an approximately uniform range and velocity gain over the whole space.During training the filter parameters are iteratively optimized.As a result the initial grid structure is dissolved by individual shifts and shape changes of the filters.
The cumulative gain of all filters after training is depicted in Fig. 12.The 2D sinc filters as well as the 2D wavelets have a bandpass characteristic.Therefore the resulting gain of cumulative filters look similar except of the fact that the 2D wavelet gain is smoother caused by its smooth filter shape in frequency domain.However, the resulting weights of unconstrained convolutional layer are quiet different and can not be physically interpreted.

E. Discussion
The limitation of the state-of-art approaches has its origin in preprocessing.The STFT for generating the RDIs equally discretizes the range as well as velocity domain.However, both activities "idle" and "working" contain very slight movements.As a result their features share similar range-Doppler bins and  This complicates a classification.However, due to the ability of learning filters, the 2D WCN as well as the 2D SincNet is able to mitigate this limitation by adapting the hyperparameter of its filters accordingly in order to clearly separate features of similar actions.Furthermore, the lack of range information gets noticeable when predicting classes based on Doppler spectrograms.The missing range information is expected to have an even higher impact when analyzing activities of multiple humans simultaneously.The 2D ConvNet is not limited by using preprocessed data.Therefore, in theory 2D ConvNet can potentially achieve the same accuracy, though not guaranteed, as SincNet and WCN by increasing the training effort.The underlying issue for this is the fact that the learned sinc filters as well as wavelets are within the search space of the unconstrained convolutional layer.However, it ends up with an accuracy of 96.6 % with a quiet high standard deviation of 2.9 % after 20 epochs.As the number of parameter is significant higher than the number of parameters of the proposed networks, the learning speed is decreased.
Using a predefined set of filters such as the proposed sinc filters or wavelets makes the outcome of the convolutional layer physically interpretability, which helps understanding system.Moreover, due to filter learning application dependent meaningful range Doppler areas are focused.As a result not only static targets, but also disturbances caused, e.g., by the rotation of a ventilator or possible interferences with wireless signals are mitigated.

VII. CONCLUSION
Human activity classification has several applications in surveillance, human-computer interfaces and smart home applications.We present an activity classifier based on parametric DCNN using 2D sinc filter kernels or 2D wavelet filter kernels that can seamlessly detect and classify human activities directly using raw ADC radar data.We demonstrated the performance of the proposed DCNN in comparison to conventional DCNNs that use Doppler spectrogram or video of RDIs as feature images and demonstrated the proposed solution offers better classification accuracy.While the preprocessing steps in the latter processing are fixed, the parametric DCNN is able to learn hyper-parameters specific to the activity classification task.Additionally, we demonstrate that the conventional un-parametric DCNN is unable to mimic the preprocessing operations due to the large search space and thus is not suitable for a robust solution.As future work, we aim to extend the proposed DCNN to simultaneously classify multiple activities from multiple targets in the field of view.
(a) Photo of the radar chipset (b) Typical FMCW block diagram

Fig. 2 :
Fig. 2: Chirp sequence in the proposed system configuration

Fig. 3 :
Fig. 3: Example video of RDIs for walking and working activities

Fig. 5 :
Fig. 5: (a) Conventional processing pipeline, involving explicit preprocessing, feature generation and neural network (b) Proposed processing pipeline, involving 2D Sinc filter DCNN or 2D Wavelet filter DCNN for implicit preprocessing, feature generation and classification.
and M are the filter-lengths, f st s and f ft s the sampling frequencies, f st l and f ft l the lower cutoff frequencies, b st and b ft the filter bandwidths respectively in slow-time and fasttime direction.Furthermore, w(n, m) is a 2D cosine weighting function.n is sweeping along slow-time and m along fasttime.An exemplary 2D sinc filter is shown in Fig. 6 in time as well as in frequency domain.In frequency domain the rectangular shape with clear cutoff frequencies can be seen.The first layer of a CNN is initialized according to the definition of 2D sinc filters and only the hyperparameters are allowed to be learned during training.

Fig. 10 :
Fig. 10: Experimental setup for data recording with test person performing activity "working"

TABLE I :
Operating Parameters.

TABLE II :
Samples per class

TABLE III :
Model sizes of DSNet and RDINet

TABLE V :
Evalution result