Train model $\epsilon_\theta$ to predict noise given the noised image and a timestep embedding
Conceptually, this model finds the direction to move $x$ to maximize how likely it is an image.
Sampling: at each timestep
In one step (the “nice property” of Gaussians/”reparameterization trick”):
First we need to understand what we’re modeling. It’s the q distribution. Forward is easy. Reverse is intractable: