Human action classification using CNN by encoding time series
skeleton-based data as images
Abstract
Microsoft Kinect camera can capture depth images of the subject during
surveillance of Human Activity Recognition (HAR) and subsequently obtain
the skeletal data. Several studies have attempted to use and analyse
human actions based on skeletal data and other complex feature
representation extraction methods. Most authors have proposed obtaining
Spatio-temporal information as one of the extraction methods. Therefore,
this study automatically extracts the Spatio-temporal information from
the skeletal data by using an Imaging time series (ITS) method called
Recurrence Plots (RP) to transform the skeleton joint coordinates into
2D images. The raw data are preprocessed and partitioned into
three-channel matrices (R, G, B) before applying the principal component
analysis (PCA). The generated RP images are used as input to
Convolutional Neural Network (CNN) to distinguish between different
activities. The proposed method uses the UTD-MHAD dataset for
benchmarking and shows that our approach outperforms previous studies
with a maximum accuracy of 92.6%.