ProgDiffusion: Progressively Self-encoding Diffusion Models
Zhangkai Wu, Xuhui Fan, Longbing Cao. KDD 2025 Research Track.
Learning low-dimensional semantic representations in diffusion models (DMs) is an open task, since in standard DMs, the dimensions of its intermediate latents are the same as that of the observations and thus are unable to represent low-dimensional semantics. Existing methods address this task either by encoding observations into semantics which makes it difficult to generate samples without observations, or by synthesizing the U-Net’s layers of pre-trained DMs into low-dimensional semantics, which is mainly used for downstream tasks rather than using semantics to facilitate the training process. Further, those generated static representations might not be aligned with dynamic timestep-wise intermediate latents. This work introduces a Progressive self-encoded Diffusion model (ProgDiffusion), which simultaneously learns semantic representations and reconstructs observations, does efficient unconditional generation, and produces progressively structured semantic representations. These benefits are gained by a novel self-encoder mechanism which takes the U-Net’s upsampling features, intermediate latent and the denoising timestep as conditions to generate time-specific semantic representations, differing from existing work of conditioning on observations only. As a result, the learned intermediate latents are dynamic and mapped to a series of semantic representations that capture their gradual changes. Notably, our proposed encoder operates independently of the observations, making it feasible for unconditional generation as observations are not required. To evaluate ProgDiffusion, we design tasks to visualise the learned progressive semantic representations, in addition to other common tasks, which validate the effectiveness of ProgDiffusion against the state-of-the-art.
The code is available at https://anonymous.4open.science/r/ProgDiffusion-8842.