Quantization
Pruning: requires HW support for sparse operations
Distillation: training on logits is more info than on labels
Misc engineering: fused kernels, etc.
Examples
Inference systems
I wouldn't recommend using it for a batch serving setting today. One crucial optimization for batched serving (which you need if you have a large number of requests) is continual batching, which this implementation doesn't have.