Background

Bigger picture

Score matching could be applied to other model types besides energy based models like autoregressive and normalized flows, since you can calculate the score function = gradient of log likelihood. But not much point since you can already do MLE on those (minimizing KL rather than Fisher divergence).

image.png

So, what’s the most general family you can train with score matching? You don’t even need to define/model energy—you can directly model the score function.

image.png

Recap of Fisher divergence. The visual intuition—compute average Euclidian distance over the whole space:

image.png

image.png

(Note the answer to the last question is “No”)

Could train a NN to output scores. But the Jacobian is expensive.

image.png

(Note: in the EBM formulation, it’s even more expensive, since you’d need to do one backprop for the score, and one more for the derivatives. You need the second order derivatives / Hessian.)

image.png

Denoising score matching