Statictics

2023-01-11

Linear algebra

determinant
Eigenvalues and Eigenvectors
Summary To solve the eigenvalue problem for an n by n matrix, follow these steps:

Compute the determinant of A - I. With I subtracted along the diagonal, this determinant starts with 1” or -X”. It is a polynomial in 1 of degree n.
Find the roots of this polynomial, by solving det(A - 21) = 0. The n roots are
the n eigenvalues of A. They make A - 11 singular.
For each eigenvalue 1, solve (A - 1.1 )x = 0 to find an eigenvector x.

Likehood and probability

e=y-ax-b
maxlikelihood：假设1.e独立
2.e残差的分布服从同分布
3.e同标准差

如果e的标准差不相同，可以用加权最小二乘法。
csdn
马同学图解数学

regularization Regression(boosting,bagging:radom forest)

slope推导 link

Ridge L2

bia:difference between fit model and training point
variance：difference between fit model and testing point
the sum of squared residuals + lambda * Slope2（Ridge Regression Penalty） -> 最小时决定slope
(relation between lambda and slope? lambda增slope减)
lambda is determined by cross validation（lambda,y-intercept,slope决定的variance最小）
几何图表示 https://blog.51cto.com/u_9205406/5606772

Lasso L1 （把小效应QTL删掉

the sum of squared residuals + lambda * ｜Slope｜
slope can be 0

Elastic-net

when parameters correlate
[300个y的解需要300个等式，即需要大样本的原因

glmnet package

Generalized Linear Models（Linear Regression + Logistic Regression）
lambda.1se results in a model with fewer parameters than lambada.min

for (i in 0.10) {
    fit.name <- paste0("alpha", i/10)
    list.of.fits[[fit.name]] <-
        cv.glmnet(x.train, y.train, type.measure="mse", alpha=i/10,
            family="gaussian")
}

for (i in 0:10) {
    fit.name <- paste0("alpha", i/10)
    predicted <-
        predict(list.of.fits[[fit.name]l,
            s=list.of.fits[[fit.name]]$lambda.ise, newx=x. test)
mse <- mean((y.test - predicted)^2)

temp <- data. frame(alpha=i/10, mse-mse, fit.name=fit.name) 
results <- rbind(results, temp)
}

bayesian

MLE: Logistics Regression
MAP: Regularized Logistics Regression
Bayesian: Bayesian Logistic Regression
MAP 可以看作是MLE加了正则项目。如果把先验设置成高斯分布那就等于加了L2正则，如果是拉普拉斯分布就是L1正则。
（遗传学假设：基因组上大部分位点对性状没贡献，所以bete是正态分布）

MAP （maximum the posterior（权重） possibility)
multinomial guaussian distribution
diagonal covariance
likelihood：parameters are given, output is random variable
posterior: data is given, parameters are random（distribution
ridge MLE

probabilistic models
supervised learning