Statictics
Linear algebra
determinant
Eigenvalues and Eigenvectors
Summary To solve the eigenvalue problem for an n by n matrix, follow these steps:
- Compute the determinant of A - I. With I subtracted along the diagonal, this determinant starts with 1” or -X”. It is a polynomial in 1 of degree n.
- Find the roots of this polynomial, by solving det(A - 21) = 0. The n roots are
the n eigenvalues of A. They make A - 11 singular. - For each eigenvalue 1, solve (A - 1.1 )x = 0 to find an eigenvector x.
Likehood and probability
e=y-ax-b
maxlikelihood:假设1.e独立
2.e残差的分布服从同分布
3.e同标准差
如果e的标准差不相同,可以用加权最小二乘法。
csdn
马同学图解数学
regularization Regression(boosting,bagging:radom forest)
slope推导 link
Ridge L2
bia:difference between fit model and training point
variance:difference between fit model and testing point
the sum of squared residuals + lambda * Slope2(Ridge Regression Penalty) -> 最小时决定slope
(relation between lambda and slope? lambda增slope减)
lambda is determined by cross validation(lambda,y-intercept,slope决定的variance最小)
几何图表示 https://blog.51cto.com/u_9205406/5606772
Lasso L1 (把小效应QTL删掉
the sum of squared residuals + lambda * |Slope|
slope can be 0
Elastic-net
when parameters correlate
[300个y的解需要300个等式,即需要大样本的原因
glmnet package
Generalized Linear Models(Linear Regression + Logistic Regression)
lambda.1se results in a model with fewer parameters than lambada.min
1 | for (i in 0.10) { |
1 | for (i in 0:10) { |
bayesian
MLE: Logistics Regression
MAP: Regularized Logistics Regression
Bayesian: Bayesian Logistic Regression
MAP 可以看作是MLE加了正则项目。如果把先验设置成高斯分布那就等于加了L2正则,如果是拉普拉斯分布就是L1正则。
(遗传学假设:基因组上大部分位点对性状没贡献,所以bete是正态分布)
MAP (maximum the posterior(权重) possibility)
multinomial guaussian distribution
diagonal covariance
likelihood:parameters are given, output is random variable
posterior: data is given, parameters are random(distribution
ridge MLE
probabilistic models
supervised learning