Linear algebra

https://stats.stackexchange.com/questions/391837/intuition-behind-gradient-of-expected-value-and-logarithm-of-probabilities

Basics
- non-singular ↔ single solution ↔ non-0 determinant ↔ linear independence ↔ invertible ↔ full rank
  - non-singular transform ↔ covers whole plane (doesn’t reduce to a line or point)
- singular ↔ 0 or infinite solutions ↔ 0 determinant ↔ linear dependence ↔ non-invertible ↔ non-full rank
  - singular transform ↔ reduce to lower rank
- gaussian elimination, row echelon, reduced row echelon
- det A * det B = det AB
- det A^-1 = (det A)^-1
- basis of a space = minimal spanning set for that space
- TODO (sum of k rank-1 matrices) has rank ≤ min(d,p,k) where d,p are the sizes of the constituent vectors
Cholesky decomposition

Convolutions

Untitled

1x1 conv = linear transformation

# example in 1d
>>> torch.tensor([[[1],[2],[3]],[[1],[1],[1]]]).float().squeeze(2) @ torch.tensor([[0,1,2,3],[4,5,6,7],[8,9,10,11]]).float()
tensor([[32., 38., 44., 50.],
        [12., 15., 18., 21.]])

>>> F.conv1d(torch.tensor([[0,1,2,3],[4,5,6,7],[8,9,10,11]]).float(),torch.tensor([[[1],[2],[3]],[[1],[1],[1]]]).float())
tensor([[32., 38., 44., 50.],
        [12., 15., 18., 21.]])