https://stats.stackexchange.com/questions/391837/intuition-behind-gradient-of-expected-value-and-logarithm-of-probabilities
Basics
Cholesky decomposition
Convolutions
1x1 conv = linear transformation
# example in 1d
>>> torch.tensor([[[1],[2],[3]],[[1],[1],[1]]]).float().squeeze(2) @ torch.tensor([[0,1,2,3],[4,5,6,7],[8,9,10,11]]).float()
tensor([[32., 38., 44., 50.],
[12., 15., 18., 21.]])
>>> F.conv1d(torch.tensor([[0,1,2,3],[4,5,6,7],[8,9,10,11]]).float(),torch.tensor([[[1],[2],[3]],[[1],[1],[1]]]).float())
tensor([[32., 38., 44., 50.],
[12., 15., 18., 21.]])