Getting intuition for sigmoid.
$\psi = \nabla_\theta \log \pi$
$\log\sigma(\theta) = \log\frac{1}{1+e^{-\theta}} = -\log(1+e^{-\theta})$ $\frac{d}{d\theta}\log\sigma(\theta) = \frac{e^{-\theta}}{1+e^{-\theta}} = 1 - \sigma(\theta)$

Fisher divergence
Non-matrix version: think 1D. Fix the current policy $\pi_\theta$ and define $f(\delta) = \text{KL}(\pi_\theta ,|, \pi_{\theta+\delta})$.
Hence need SMALLER step at $\theta=0$ where $F$ is large than farther out.