Statictics

Linear algebra

determinant
Eigenvalues and Eigenvectors
Summary To solve the eigenvalue problem for an n by n matrix, follow these steps:

  1. Compute the determinant of A - I. With I subtracted along the diagonal, this determinant starts with 1” or -X”. It is a polynomial in 1 of degree n.
  2. Find the roots of this polynomial, by solving det(A - 21) = 0. The n roots are
    the n eigenvalues of A. They make A - 11 singular.
  3. For each eigenvalue 1, solve (A - 1.1 )x = 0 to find an eigenvector x.

Likehood and probability

e=y-ax-b
maxlikelihood:假设1.e独立
2.e残差的分布服从同分布
3.e同标准差

如果e的标准差不相同,可以用加权最小二乘法。
csdn
马同学图解数学

regularization Regression(boosting,bagging:radom forest)

slope推导 link

Ridge L2

bia:difference between fit model and training point
variance:difference between fit model and testing point
the sum of squared residuals + lambda * Slope2(Ridge Regression Penalty) -> 最小时决定slope
(relation between lambda and slope? lambda增slope减)
lambda is determined by cross validation(lambda,y-intercept,slope决定的variance最小)
几何图表示 https://blog.51cto.com/u_9205406/5606772

Lasso L1 (把小效应QTL删掉

the sum of squared residuals + lambda * |Slope|
slope can be 0

Elastic-net

when parameters correlate
[300个y的解需要300个等式,即需要大样本的原因

glmnet package

Generalized Linear Models(Linear Regression + Logistic Regression)
lambda.1se results in a model with fewer parameters than lambada.min

1
2
3
4
5
6
for (i in 0.10) {
fit.name <- paste0("alpha", i/10)
list.of.fits[[fit.name]] <-
cv.glmnet(x.train, y.train, type.measure="mse", alpha=i/10,
family="gaussian")
}
1
2
3
4
5
6
7
8
9
10
for (i in 0:10) {
fit.name <- paste0("alpha", i/10)
predicted <-
predict(list.of.fits[[fit.name]l,
s=list.of.fits[[fit.name]]$lambda.ise, newx=x. test)
mse <- mean((y.test - predicted)^2)

temp <- data. frame(alpha=i/10, mse-mse, fit.name=fit.name)
results <- rbind(results, temp)
}

bayesian

MLE: Logistics Regression
MAP: Regularized Logistics Regression
Bayesian: Bayesian Logistic Regression
MAP 可以看作是MLE加了正则项目。如果把先验设置成高斯分布那就等于加了L2正则,如果是拉普拉斯分布就是L1正则。
(遗传学假设:基因组上大部分位点对性状没贡献,所以bete是正态分布)

MAP (maximum the posterior(权重) possibility)
multinomial guaussian distribution
diagonal covariance
likelihood:parameters are given, output is random variable
posterior: data is given, parameters are random(distribution
ridge MLE

probabilistic models
supervised learning