Name: Appy Pie
Rating: 4.9 (3609 reviews)

Author(s) : Zhengze Xu, Mengting Chen, Zhao Wang, Linyu Xing, Zhonghua Zhai, Nong Sang, Jinsong Lan, Shuai Xiao, Changxin Gao

This paper tackles the challenge of video try-on, an area where previous research has yielded limited success. The core difficulty lies in simultaneously preserving intricate clothing details and generating realistic, coherent motions throughout the video.

To address these challenges, the authors propose “Tunnel Try-on,” a novel diffusion-based framework. The method revolves around creating a “focus tunnel” within the input video. This tunnel essentially provides close-up shots specifically focused on the clothing regions. By zooming in on this area, the framework can preserve the clothing’s fine details more effectively.

To ensure coherent motions across the video, the approach employs a two-pronged strategy. First, a Kalman filter is used to generate smooth crops within the focus tunnel. Second, the tunnel’s position embedding is injected into the attention layers of the model. This step helps improve the continuity of the generated video sequence.

Furthermore, the framework incorporates an environment encoder. This encoder extracts contextual information from areas outside the tunnels, providing supplementary cues for the video generation process.

Through this combination of techniques, Tunnel Try-on achieves the dual objective of preserving clothing details while synthesizing stable and smooth videos. The paper’s findings demonstrate significant advancements and position Tunnel Try-on as a potential first step towards commercially viable video try-on applications.

Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos