Paper
- Fine-grained / many experts
- Each MoE layer is split into two groups of experts:
- Shared experts (a small, fixed set)
- Routed experts (the large MoE pool)
- For every token:
- All shared experts are executed unconditionally (no router decision).
- Only a subset of routed experts are selected by the router (top-K over routed experts).