loading page

Improving Small Objects Detection using Transformer
  • +1
  • Shikha Dubey ,
  • Farrukh Olimov ,
  • Muhammad Aasim Rafique ,
  • Moongu Jeon
Shikha Dubey

Corresponding Author:[email protected]

Author Profile
Farrukh Olimov
Author Profile
Muhammad Aasim Rafique
Author Profile
Moongu Jeon
Author Profile


General artificial intelligence is a trade-off between the inductive bias of an algorithm and its out-of-distribution generalization performance. The conspicuous impact of inductive bias is an unceasing trend of improved predictions in various problems in computer vision like object detection. Although a recently introduced object detection technique, based on transformers (DETR), shows results competitive to the conventional and modern object detection models, its accuracy deteriorates for detecting small-sized objects (in perspective). This study examines the inductive bias of DETR and proposes a normalized inductive bias for object detection using a transformer (SOF-DETR). It uses a lazy-fusion of features to sustain deep contextual information of objects present in the image. The features from multiple subsequent deep layers are fused with element-wise-summation and input to a transformer network for object queries that learn the long and short-distance spatial association in the image by the attention mechanism.
SOF-DETR uses a global set-based prediction for object detection, which directly produces a set of bounding boxes. The experimental results on the MS COCO dataset show the effectiveness of the added normalized inductive bias and feature fusion techniques by detecting more small-sized objects than DETR.
Nov 2022Published in Journal of Visual Communication and Image Representation volume 89 on pages 103620. 10.1016/j.jvcir.2022.103620