Why TPUs (systolic arrays) very fast at matmuls:
animation
,
paper
TPUs
I THINK
TPU pod has 4 = 2x2
8x4=32 has 8 pods
softslicing for using fewer chips
certain compute like sorting on tpu’s is bad