Essential Maintenance: All Authorea-powered sites will be offline 4pm-6pm EDT Tuesday 28 May for essential maintenance.
We apologise for any inconvenience.

loading page

FreeV:Free Lunch in MultiModal Diffusion U-ViT
  • Zhou Qiangong ,
  • Youyu Zhou ,
  • Yahong Wang
Zhou Qiangong
School of Mathematics and Civil Engineering

Corresponding Author:[email protected]

Author Profile
Youyu Zhou
Author Profile
Yahong Wang
Author Profile

Abstract

This paper reveals the untapped potential of the U-ViT architecture in diffusion models. The study initially explores the contribution of the U-ViT architecture in the visual generation task of multimodal diffusion models and proposes an improvement scheme, â\euro?FreeVâ\euro?, specifically designed for the U-ViT architecture â\euro“ marking the first application of the U-Net-based FreeU enhancement framework within the Transformer architecture. The FreeV framework significantly enhances generation quality without requiring additional training or fine-tuning. The key insight of this study lies in balancing the contributions from the backbone network, skip connections, and fused feature maps within the U-ViT to fully leverage the advantages of both components while circumventing the limitations of feature fusion in U-ViT.Project page: https://github.com/GoldenFishes/FreeVÂ