Distribution | Notation | GLM Type | Link Function | MLE Loss Function |
---|---|---|---|---|
Gaussian | $N(\mu, \sigma^2)$ | Linear Regression | $g(\mu) = \mu$ | $L = \frac{1}{2n}\sum_{i=1}^n (y_i - \hat{y}_i)^2$ |
Binomial | $B(n, p)$ | Logistic Regression | $g(p) = \log(\frac{p}{1-p})$ | $L = -\frac{1}{n}\sum_{i=1}^n [y_i \log(\hat{p}_i) + (1-y_i) \log(1-\hat{p}_i)]$ |
Poisson | $Pois(\lambda)$ | Poisson Regression | $g(\lambda) = \log(\lambda)$ | $L = \frac{1}{n}\sum_{i=1}^n [\hat{\lambda}_i - y_i \log(\hat{\lambda}_i)]$ |
Multinomial | $Mult(n, p_1, ..., p_k)$ | Multinomial Logistic Regression | $g(p_j) = \log(\frac{p_j}{p_k})$ for j = 1, ..., k-1 | $L = -\frac{1}{n}\sum_{i=1}^n \sum_{j=1}^k y_{ij} \log(\hat{p}_{ij})$ |
Gamma | $Gamma(k, \theta)$ | Gamma Regression | $g(\mu) = \frac{1}{\mu}$ | $L = \frac{1}{n}\sum_{i=1}^n [\frac{y_i}{\hat{\mu}_i} + \log(\hat{\mu}_i)]$ |
Inverse Gaussian | $IG(\mu, \lambda)$ | Inverse Gaussian Regression | $g(\mu) = \frac{1}{\mu^2}$ | $L = \frac{1}{n}\sum_{i=1}^n [\frac{(y_i - \hat{\mu}_i)^2}{y_i \hat{\mu}_i^2}]$ |
various similarity measures
item-item similarity (first pub by glinden et al)
predict a user's rating of an item as the weighted sum of other items' ratings by the user, where weights are item similarities:
score[usr,itm] = avg(sim(itm,itm2) * score[usr,itm2] for itm2 in itms)
user-user similarity
predict a user's rating of an item as the weighted avg of other users' ratings for the item, where weights are user similarities:
score[usr,itm] = avg(sim(usr,usr2) * score[usr2,itm] for usr2 in usrs)
evaluation metrics: RMSE is typical
forward selection: add features one/some at a time
backward elimination aka recursive elimination: eg this faster/sloppier version of naive backward (which is opposite of forward)
start with all features in model candidates (for removal) = all features for each iteration, remove from candidates any feature whose exclusion yields no acc drop remove (from model and candidates) feature in candidates with biggest acc drop
combination: add & remove features, eg forward while optionally removing a feature at each step
increase weight along most-correlated var by $\epsilon$
let vector r = y
let vector beta = 0
iterate:
find x[j] most correlated with r
let delta = epsilon * sign(r, x[j])
set beta[j] += delta
set r -= delta * x[j]
identical to LASSO for orthogonal predictors, and similar in general case