Background

See score matching in Energy-based models

Bigger picture

Score matching could be applied to other model types besides energy based models like autoregressive and normalized flows, since you can calculate the score function = gradient of log likelihood. But not much point since you can already do MLE on those (minimizing KL rather than Fisher divergence).

So, what’s the most general family you can train with score matching? You don’t even need to define/model energy—you can directly model the score function.

Recap of Fisher divergence. The visual intuition—compute average Euclidian distance over the whole space:

(Note the answer to the last question is “No”)

Could train a NN to output scores. But the Jacobian is expensive.

(Note: in the EBM formulation, it’s even more expensive, since you’d need to do one backprop for the score, and one more for the derivatives. You need the second order derivatives / Hessian.)

Background

Bigger picture

Denoising score matching