Juxtaposed:

Generative adversarial networks: GAN provides a smart solution to model the data generation, an unsupervised learning problem, as a supervised one. The discriminator model learns to distinguish the real data from the fake samples that are produced by the generator model. Two models are trained as they are playing a minimax game.
Variational autoencoders: VAE inexplicitly optimizes the log-likelihood of the data by maximizing the evidence lower bound (ELBO).
Flow-based generative models: A flow-based generative model is constructed by a sequence of invertible transformations. Unlike other two, the model explicitly learns the data distribution and therefore the loss function is simply the negative log-likelihood p(x)

Summary: GAN models are known for potentially unstable training and less diversity in generation due to their adversarial training nature. VAE relies on a surrogate loss. Flow models have to use specialized architectures to construct reversible transform.

TODO

normalizing flows, continuous normalizing flows
flow matching
rectified flow

Refs

Flow matching

Key differences:

Model Output:
- Diffusion models predict noise components that were added to the data
- Flow Matching directly predicts a vector field (velocity) that transforms between distributions
Sampling Process:
- Diffusion uses a noise-prediction and denoising process
- Flow Matching uses ODE-based sampling following the learned vector field
Training Objective:
- Diffusion trains to predict noise components
- Flow Matching trains to match a target vector field direction

An interesting advantage of the Flow Matching approach is that you can use simpler paths between distributions (like the OT path shown). This can lead to more efficient sampling since the paths are more direct compared to diffusion paths which tend to take more meandering routes.

Also note that while this example is simplified, Flow Matching can work with any continuous path between distributions, with diffusion paths being just one special case. The OT paths shown in the paper tend to be more efficient because they represent the most direct path between distributions in Wasserstein space.