Transformers: Statistical Interpretation, Architectures and Applications

Fanfei Meng; Yuxin Wang

doi:10.36227/techrxiv.24638811.v1

loading page

Transformers: Statistical Interpretation, Architectures and Applications

Fanfei Meng ,
Yuxin Wang

Abstract

Transformers have been widely recognized as powerful tools to analyze multiple tasks due to its state-of art multi-head attention spaces, such as Natural Language Processing (NLP), Computer Vision (CV) and Speech Recognition (SR). Inspired by its abundant designs and strong functions on analyzing input data, we would like to start from the various architectures, further proceed to the investigation on its statistical mechanism and inference and then introduce its applications on dominant tasks. The underlying statistical mechanisms arouse our interests and intrigue us to investigate it in a higher level, and this surveys will focus on its mathematical foundations and then use the principles to try to analyze the reasons for its excellent performance on many recognition scenarios.