sof-detr_letter_v2.pdf (3.29 MB)
Download file

Improving Small Objects Detection using Transformer

Download (3.29 MB)
posted on 2021-11-29, 04:03 authored by Shikha DubeyShikha Dubey, Farrukh Olimov, Muhammad Aasim Rafique, Moongu Jeon
General artificial intelligence is a trade-off between the inductive bias of an algorithm and its out-of-distribution generalization performance. The conspicuous impact of inductive bias is an unceasing trend of improved predictions in various problems in computer vision like object detection. Although a recently introduced object detection technique, based on transformers (DETR), shows results competitive to the conventional and modern object detection models, its accuracy deteriorates for detecting small-sized objects (in perspective). This study examines the inductive bias of DETR and proposes a normalized inductive bias for object detection using a transformer (SOF-DETR). It uses a lazy-fusion of features to sustain deep contextual information of objects present in the image. The features from multiple subsequent deep layers are fused with element-wise-summation and input to a transformer network for object queries that learn the long and short-distance spatial association in the image by the attention mechanism.
SOF-DETR uses a global set-based prediction for object detection, which directly produces a set of bounding boxes. The experimental results on the MS COCO dataset show the effectiveness of the added normalized inductive bias and feature fusion techniques by detecting more small-sized objects than DETR.


Email Address of Submitting Author

Submitting Author's Institution


Submitting Author's Country

  • South Korea