Graphical models
Numerical sampling algorithms
Inverse transform sampling
Intuition: given ability to generate random numbers in [0,1] i.e. sample uniform $U(0,1)$, if you know the target distribution’s CDF $F$, go backward by mapping an output point on the range (which, since it’s a probability CDF, is always in [0,1]) to the input domain.
Derivation
$$ F_X(x)=P[X<x]=P[T(U)<x]=P[U<T^{-1}(x)]=T^{-1}(x) $$
But you need the CDF. Many real world problems don’t have this. And then you need to invert it too.
Monte Carlo sampling methods
Accept-reject sampling aka rejection sampling: simple, sample the same domain with an easy-to-sample distribution, and accept the sample with probability p_target(x) / p_surrogate(x)
.
markov chain monte carlo (MCMC): general family of algorithms
Gibbs sampling: MCMC; sample from multidimensional distribution by sampling from conditionals along each dimension
Special cast of Metropolis-Hastings
E.g. 2D normal: pick some initial $x,y$, then sample $p(x\mid y)$ then vice versa. Assumes sampling $p(x \mid y)$ easier than $p(x,y)$.
Doesn’t work when can’t move along any single dimension to arrive at other density islands
Inefficient when mostly low density, hard to arrive at high density bubbles, and then won’t leave
Metropolis Hastings Monte Carlo (MHMC): most popular MCMC algorithm
Hamiltonian Monte Carlo (HMC): MCMC; intuition is, flip the pdf, simulate a frictionless object falling down the slopes, occasionally randomly kick it out of minima (source)
No U-Turn Sampler (NUTS): a HMC method; used in STAN, Pyro
Sampling from a Bayes net
Control variates
Suppose you want to estimate E[X] where X is some random variable. The naive way would be to sample X many times and take the average. But if you can find another random variable Y where:
- You know E[Y] exactly
- Y is correlated with X
Then you can use the estimator: X - α(Y - E[Y])
This is guaranteed to be unbiased for any α because: E[X - α(Y - E[Y])] = E[X] - α(E[Y] - E[Y]) = E[X]
Example: estimate E[sin(X)] where X follows a standard normal distribution. The clever part is:
- We know that E[X] = 0 for a standard normal
- sin(X) is correlated with X (when X is small, sin(X) ≈ X)
- So X makes a good control variate!
The key steps in applying control variates are:
- Find a correlated quantity Y with known expectation
- Find the optimal scaling factor α (usually through Cov(X,Y)/Var(Y))
- Subtract α(Y - E[Y]) from your original estimator
Latent dirichlet allocation (LDA)