TechRxiv
sof-detr_letter_v2.pdf (3.29 MB)
Download file

Improving Small Objects Detection using Transformer

Download (3.29 MB)
preprint
posted on 29.11.2021, 04:03 by Shikha DubeyShikha Dubey, Farrukh Olimov, Muhammad Aasim Rafique, Moongu Jeon
General artificial intelligence is a trade-off between the inductive bias of an algorithm and its out-of-distribution generalization performance. The conspicuous impact of inductive bias is an unceasing trend of improved predictions in various problems in computer vision like object detection. Although a recently introduced object detection technique, based on transformers (DETR), shows results competitive to the conventional and modern object detection models, its accuracy deteriorates for detecting small-sized objects (in perspective). This study examines the inductive bias of DETR and proposes a normalized inductive bias for object detection using a transformer (SOF-DETR). It uses a lazy-fusion of features to sustain deep contextual information of objects present in the image. The features from multiple subsequent deep layers are fused with element-wise-summation and input to a transformer network for object queries that learn the long and short-distance spatial association in the image by the attention mechanism.
SOF-DETR uses a global set-based prediction for object detection, which directly produces a set of bounding boxes. The experimental results on the MS COCO dataset show the effectiveness of the added normalized inductive bias and feature fusion techniques by detecting more small-sized objects than DETR.

History

Email Address of Submitting Author

shikha.d@gm.gist.ac.kr

Submitting Author's Institution

GIST

Submitting Author's Country

South Korea