https://arxiv.org/abs/2407.01082

Core Mechanism

At each token generation step, min-p:

  1. Identifies the maximum probability token (p_max)
  2. Sets a dynamic threshold by multiplying p_max by a base parameter (p_base, typically 0.05-0.1)
  3. Filters out tokens with probabilities below this threshold
  4. Samples from the remaining pool of valid tokens

The key formula is: threshold = p_base × p_max

Key Advantages

Dynamic Adaptation: Unlike top-p (nucleus sampling) which uses a fixed cumulative probability threshold, min-p adapts based on the model's confidence:

Better at High Temperatures: The paper shows min-p maintains coherence even at high temperature settings (2.0-3.0) where traditional methods like top-p produce incoherent text. This makes it particularly valuable for creative writing and diverse text generation.

Practical Impact

According to the paper, min-p has been widely adopted: