Refs

Overview

tldr:

Two ways are used to solve for p(Θ|X): (i) using simulation (MCMC) or (ii) through optimization (VI). Sampling-based methods have several important shortcomings.

Idea: We look for a distribution q(Θ) that is a stand-in (a surrogate) for p(Θ|X). We then try to make q[Θ|Φ(X)] look similar to p(Θ|X) by changing the values of Φ (Fig. 2). This is done by maximising the evidence lower bound (ELBO):

(Φ= E[ln p(X,Θ) — ln q(Θ|Φ)],

where the expectation E[·] is taken over q(Θ|Φ). (Note that Φ implicitly depends on the dataset X, but for notational convenience we'll drop the explicit dependence.)

We turn Bayesian inference into an optimization problem.

The main differences between sampling and variational techniques are that: