What You See Is What You Transform: Foveated Spatial Transformers as a
bio-inspired attention mechanism
Abstract
Convolutional Neural Networks have been considered the go-to option for
object recognition in computer vision for the last couple of years.
However, their invariance to object’s translations is still deemed as a
weak point and remains limited to small translations only via their
max-pooling layers. One bio-inspired approach considers the What/Where
pathway separation in Mammals to overcome this limitation. This approach
works as a nature-inspired attention mechanism, another classical
approach of which is Spatial Transformers. These allow an adaptive
endto-end learning of different classes of spatial transformations
throughout training. In this work, we overview Spatial Transformers as
an attention-only mechanism and compare them with the What/Where model.
We show that the use of attention restricted or “Foveated” Spatial
Transformer Networks, coupled alongside a curriculum learning training
scheme and an efficient log-polar visual space entry, provides better
performance when compared to the What/Where model, all this without the
need for any extra supervision whatsoever.