Abstract
Pedestrian detection is considered one of the most challenging problems
in computer vision, as it involves the combination of classification and
localization within a scene. Recently, convolutional neural networks
(CNNs) have been demonstrated to achieve superior detection results
compared to traditional approaches. Although YOLOv3 (an improved You
Only Look Once model) is proposed as one of state-of-the-art methods in
CNN-based object detection, it remains very challenging to leverage this
method for real-time pedestrian detection. In this paper, we propose a
new framework called SA YOLOv3, a scale-aware You Only Look Once
framework which improves YOLOv3 in improving pedestrian detection of
small scale pedestrian instances in a real-time manner.
Our network introduces two sub-networks which detect pedestrians of
different scales. Outputs from the sub-networks are then combined to
generate robust detection results.
Experimental results show that the proposed SA YOLOv3 framework
outperforms the results of YOLOv3 on public datasets and run at an
average of 11 fps on a GPU.