Constrained Diffusion Implicit Models

Vivek Jayaram   Ira Kemelmacher-Shlizerman   Steven M. Seitz   John Thickstun

University of Washington, Cornell University

ICLR 2025 (Submitted)

Gradio Demo

[Paper] [Code]

Abstract

This paper describes an efficient algorithm for solving noisy linear inverse problems using pretrained diffusion models. Extending the paradigm of denoising diffusion implicit models (DDIM), we propose constrained diffusion implicit models (CDIM) that modify the diffusion updates to enforce a constraint upon the final output. For noiseless inverse problems, CDIM exactly satisfies the constraints; in the noisy case, we generalize CDIM to satisfy an exact constraint on the residual distribution of the noise. Experiments across a variety of tasks and metrics show strong performance of CDIM, with analogous inference acceleration to unconstrained DDIM: 10 to 50 times faster than previous conditional diffusion methods. We demonstrate the versatility of our approach on many problems including super-resolution, denoising, inpainting, deblurring, and 3D point cloud reconstruction.

We plot the strength of various methods against the runtime. You can see that our methods (circled in the top left) have very strong performance while requiring a fraciton of the runtime of existing methods.

Method

Our method optimizes the KL divergence of the residuals at each diffusion timestep (Algorithm 1). We also show that KL divergence optimization is the same as L2 early stopping, and showcase that in Algorithm 2.

Additional Results

We use CDIM to inpaint a sparse point cloud projection. We take 10 images from the Grand Budapest Hotel movie and use Colmap to create a point cloud. This point cloud when projected from a novel angle is very sparse (left image). We can use CDIM to fill it in (right image).

Extended results on random inpainting (92% missing pixels) on the FFHQ dataset.

Extended results on inpainting on the imagenet dataset.

Contact and Info

UW GRAIL, UW Reality Lab, University of Washington

{vjayaram, seitz, kemelmi}@cs.washington.edu

Acknowledgements

The authors thank the labmates from UW GRAIL Lab. This work was supported by the UW Reality Lab, Facebook, Google, Lenovo, and Amazon.