

Train model $\epsilon_\theta$ to predict noise given the noised image and a timestep embedding
Conceptually, this model finds the direction to move $x$ to maximize how likely it is an image.

Sampling: at each timestep

In one step (the “nice property” of Gaussians/”reparameterization trick”):


First we need to understand what we’re modeling. It’s the q distribution. Forward is easy. Reverse is intractable:
