In much of the following, I’m interchangeably using Θ and $z$.
We want to model $p(X;\theta)$, but we also explicitly model latent variables/features $Z$.
Example: GMMs—for clustering, unsupervised learning
Another motivation:
Can also use more concrete examples if they’re easier to think about:
tldr: