TechRxiv
manuscript_foveated_transformers.pdf (2.68 MB)

What You See Is What You Transform: Foveated Spatial Transformers as a bio-inspired attention mechanism

Download (2.68 MB)
preprint
posted on 07.09.2021, 21:28 by Ghassan DabaneGhassan Dabane, Laurent PerrinetLaurent Perrinet, Emmanuel Daucé
Convolutional Neural Networks have been considered the go-to option for object recognition in computer vision for the last couple of years. However, their invariance to object’s translations is still deemed as a weak point and remains limited to small translations only via their max-pooling layers. One bio-inspired approach considers the What/Where pathway separation in Mammals to overcome this limitation. This approach works as a nature-inspired attention mechanism, another classical approach of which is Spatial Transformers. These allow an adaptive endto-end learning of different classes of spatial transformations throughout training. In this work, we overview Spatial Transformers as an attention-only mechanism and compare them with the What/Where model. We show that the use of attention restricted or “Foveated” Spatial Transformer Networks, coupled alongside a curriculum learning training scheme and an efficient log-polar visual space entry, provides better performance when compared to the What/Where model, all this without the need for any extra supervision whatsoever.

History

Email Address of Submitting Author

dabane.ghassan@gmail.com

ORCID of Submitting Author

0000-0001-9686-8047

Submitting Author's Institution

Institut de Neurosciences de la Timone

Submitting Author's Country

France