VitGAN.pdf (811.05 kB)
Download fileVit-GAN: Image-to-image Translation with Vision Transformes and Conditional GANS
In this paper, we have developed a general-purpose architecture, Vit-Gan, capable of performing
most of the image-to-image translation tasks from semantic image segmentation to single image depth
perception. This paper is a follow-up paper, an extension of generator based model [1] in which the
obtained results were very promising. This opened the possibility of further improvements with adversarial architecture. We used a unique vision transformers-based generator architecture and Conditional
GANs(cGANs) with a Markovian Discriminator (PatchGAN) (https://github.com/YigitGunduc/vit-gan).
In the present work, we use images as conditioning arguments. It is observed that the obtained results
are more realistic than the commonly used architectures.