FreeV:Free Lunch in MultiModal Diffusion U-ViT
  • Zhou Qiangong ,
  • Youyu Zhou ,
  • Yahong Wang
Zhou Qiangong
School of Mathematics and Civil Engineering

Corresponding Author:[email protected]

Author Profile
Youyu Zhou
Youyu Zhou
Yahong Wang
Yahong Wang


This paper reveals the untapped potential of the U-ViT architecture in diffusion models. The study initially explores the contribution of the U-ViT architecture in the visual generation task of multimodal diffusion models and proposes an improvement scheme, â\euro?FreeVâ\euro?, specifically designed for the U-ViT architecture â\euro“ marking the first application of the U-Net-based FreeU enhancement framework within the Transformer architecture. The FreeV framework significantly enhances generation quality without requiring additional training or fine-tuning. The key insight of this study lies in balancing the contributions from the backbone network, skip connections, and fused feature maps within the U-ViT to fully leverage the advantages of both components while circumventing the limitations of feature fusion in U-ViT.Project page: https://github.com/GoldenFishes/FreeVÂ