References

Normalizing flows

The Energy-Based View of Diffusion Models

Diffusion models can be interpreted as learning an energy function at each noise level. Here's how:

Score Matching Connection

Diffusion models learn the score function: $\nabla_x \log p_t(x)$

This is directly related to energy: $\nabla_x \log p_t(x) = -\nabla_x E_t(x)$

So the denoising network is actually learning the negative gradient of an implicit energy function!

Energy Function Interpretation

For a diffusion model at timestep t:


python
# What the model learns
score = denoising_network(x_noisy, t)# Approximates ∇log p_t(x)# Implicit energy interpretation
energy_gradient = -score
# Energy could be recovered by integration (though we don't need to)

Sampling as Energy Minimization

The denoising process can be viewed as gradient descent on energy:


python
# Standard diffusion sampling
x_t = x_{t+1} - learning_rate * denoising_network(x_{t+1}, t) + noise

# Energy-based interpretation
x_t = x_{t+1} - learning_rate * (-∇E_t(x_{t+1})) + noise
#    = x_{t+1} + learning_rate * ∇E_t(x_{t+1}) + noise

This is Langevin dynamics - a way to sample from an EBM!

Why This Matters

1. Theoretical Understanding