Refs

In much of the following, I’m interchangeably using Θ and $z$.

Why latent variables

We want to model $p(X;\theta)$, but we also explicitly model latent variables/features $Z$.

Example: GMMs—for clustering, unsupervised learning

Another motivation:

Can also use more concrete examples if they’re easier to think about:

Overview

tldr:

optimize parameters of a simpler distribution (from the variational family) to best approximate the target posterior distribution