KL(p||q): p is target/truth. q is our “channel.” How far is q from p?

As objective function (minimized). qθ is always the distribution being changed.

Estimators over just samples $x \sim q(x)$
| Estimator | Bias | Variance | Comments |
|---|---|---|---|
| $k_1 = \log\frac{p(x)}{q(x)} = -\log r$ | no bias | high variance | Naive |
| $k_2 = \frac{1}{2}(\log\frac{p(x)}{q(x)})^2 = \frac{1}{2}(\log r)^2$ | biased | low variance | $\ge 0$ |
| $k_3 = (r-1) - \log r$ | no bias | low variance | $\ge 0$ |
Idea for $k_3$: add a control variate, something that “cancels out” the variance of $k_1$, i.e. something that both (1) we know has no bias / 0 expectation, and (2) is negatively correlated with $k_1$.


