Meta learning, i.e. learning to learn. E.g. neural architecture search
Helps with long attention.
i know about transformers and llms, but for this paper, what are they talking about when they say sparse/sparsity?
In this paper, "sparse" or "sparsity" refers to a pattern in neural networks where only a portion of neurons are actively contributing to the computation for any given input, while many others have negligible impact and could be skipped during inference.