Papers

Scalable-Softmax Is Superior for Attention, 2025

Helps with long attention.

image.png

image.png

image.png

ReLU2 wins: sparse LLMs, 2024

Paper

i know about transformers and llms, but for this paper, what are they talking about when they say sparse/sparsity?