Meta learning, i.e. learning to learn. E.g. neural architecture search
Helps with long attention.
Paper
i know about transformers and llms, but for this paper, what are they talking about when they say sparse/sparsity?