Latent variable models:
Decompose into simpler prior and conditional distributions:
Take the following as accepted for now:
Apply Jensen’s ineq:
Entropy review:
Intuition: going back to the expectation, there’s some $q_i$ where sampling from it maximizes the first part (under $p(x_i \mid z)$), but we also want it to be max entropy.