https://arxiv.org/abs/2407.01082
Core Mechanism
At each token generation step, min-p:
- Identifies the maximum probability token (p_max)
- Sets a dynamic threshold by multiplying p_max by a base parameter (p_base, typically 0.05-0.1)
- Filters out tokens with probabilities below this threshold
- Samples from the remaining pool of valid tokens
The key formula is: threshold = p_base × p_max
Key Advantages
Dynamic Adaptation: Unlike top-p (nucleus sampling) which uses a fixed cumulative probability threshold, min-p adapts based on the model's confidence:
- When the model is highly confident (high p_max), it restricts to only the most probable tokens
- When the model is uncertain (low p_max), it allows more diverse options
Better at High Temperatures: The paper shows min-p maintains coherence even at high temperature settings (2.0-3.0) where traditional methods like top-p produce incoherent text. This makes it particularly valuable for creative writing and diverse text generation.
Practical Impact
According to the paper, min-p has been widely adopted:
- Integrated into major frameworks like Hugging Face Transformers, vLLM, and llama.cpp
- Shows consistent improvements on benchmarks (GPQA, GSM8K, creative writing)