Hi, there! I'm LeiMengyu.

In 2021, I started my study in quantitative genetics. This blog was created to record and summarize what I have learned. It is still in its initial stage of construction, and the summary of theoretical part is still being improved. Besides, my codes were posted there. Here is an outline of my blog. My first article is in preparation.

Bulked segregant analysis(BSA)分析学习分享

Bulked segregant analysis
综述 https://onlinelibrary.wiley.com/doi/10.1111/tpj.15646

所有方法都应考虑仅留取父母本本就存在的variant。

方法一

材料: X基因型株系、X基因型使用诱变剂产生的突变体(或随意亲本)
genome reference: X基因型

  1. 将材料杂交得到F1,F1自交得到F2,在F2中选择其中一种极端表型构建混池。
  2. 将混池和参考基因组进行比对并call snp得到vcf1文件。
  3. 此时vcf1文件中除了突变位点,其他部分都应与X基因生成的vcf2相同。
    参考:https://academic.oup.com/g3journal/article/7/12/3947/6027434#235903430
    可参考分析代码:https://github.com/qslin/Bulk-Segregation-Analysis/tree/master
The Summary of Machine learning (StatQuest)

cross validation
sensitivity:the TP(true positive) were correctly identified by the model. TP/(TP+FN)
specificity:the TN(true negative) were correctly identified by the model. TN/(TN+FP)
bias & variance : overfit
3 commonly model: regularization, boosting, and bagging(random forest)
ROC (FPR,TPR): to choose the optimal threshold. higher y, lower x.
AUC (ROC下面积): to choose the model. ROC与x轴围成的面积,越大越好
precision=TP/(TP+FP) [研究罕见病时使用,因为此时TN多

LDAK

Re-evaluation of SNP heritability in complex human traits

得出 SNP 的预期遗传力与次要等位基因频率 (MAF)、与其他 SNP 的连锁不平衡 (LD) 水平和基因型确定性之间的近似关系

SumHer better estimates the SNP heritability of complex traits from summary statistics

Improved genetic prediction of complex traits from individual-level data or summary statistics

GCTA与本文提出方法的区别:前者每个snp的h2(effect size)是常数(相同),后者,因为每个snp的h2受MAF、LD和功能注释影响
PRS
预测准确性R2(公式?) 上限为遗传力

DataScience

参考:https://zhuanlan.zhihu.com/p/68307444
OUTLINE: RCT[试验] -> Potential Outcome Framework(/Structural Equation Modeling(适用于社会科学))[分析框架] -> 置信区间理论/Fisher exact test[分析方法]

因果关系定义与潜在结果分析框架(Rubin因果模型)
该模型三要素:潜在结果(Potential Outcome)、个体处理稳定性假设(Stable Unit Treatment Value Assumption, SUTVA),并提到了分配机制(Assignment Mechanism)的重要性。

观察到的偏差=treatment effect + selected bias
随机对照试验(Randomized Controlled Trials, RCT)可以让选择偏差消失。

PRS

PGS are biomarkers, i.e., predictors of risk of diseases
PGS can be measured at any time during life (e.g., birth)
• Remain the same thru life because DNA is the same
• Update if GWAS discovery or PGS method change
PGS are not diagnostic genetic tests*
• Genetic risk only account for part of risk
• PGS only account for part of the genetic risk
• PGS will improve in accuracy as WAS discovery sample size increase

Article

Comprehensive analyses of 1771 transcriptomes cross 7 tissues enhance genetic and biological interpretations for complex traits in maize

https://www.biorxiv.org/content/10.1101/2023.08.09.552713v1

Abstract

By analyzing 1771 RNA-seq data from 7 tissues sampled from 298 genotypes, we studied the tissue expression specificity and built a comprehensive multi-tissue gene regulation atlas in maize. We describe the landscape of the transcriptome variation and identify thousands of alleles associations with gene expression in 7 tissues. The tissue-sharing patterns of these genetic regulatory effects is consistence with phenotypic correlation, highlighted a general contribution from tissue specific regulatory variation to across tissue transcriptomes variation. Using transcriptome-wide association study (TWAS), we linked gene expression variation in different tissues to agronomic traits variation, revealing tissue-specific expression contribution to traits. In addition, through integrative analyses of tissue-specific gene regulation variation with genome-wide association studies, we detected relevant tissue types and candidate genes for agronomic traits to elucidate the genetic mechanisms underpinning such agronomic traits in maize. Our findings provided novel insights into the genetic and biological mechanisms underlying complex traits in maize, and our transcriptome atlas can serve as a primary source for biological interpretation, functional validation, and genomic improvement in plants.

Statictics

Linear algebra

determinant
Eigenvalues and Eigenvectors
Summary To solve the eigenvalue problem for an n by n matrix, follow these steps:

  1. Compute the determinant of A - I. With I subtracted along the diagonal, this determinant starts with 1” or -X”. It is a polynomial in 1 of degree n.
  2. Find the roots of this polynomial, by solving det(A - 21) = 0. The n roots are
    the n eigenvalues of A. They make A - 11 singular.
  3. For each eigenvalue 1, solve (A - 1.1 )x = 0 to find an eigenvector x.

Likehood and probability

e=y-ax-b
maxlikelihood:假设1.e独立
2.e残差的分布服从同分布
3.e同标准差

如果e的标准差不相同,可以用加权最小二乘法。
csdn
马同学图解数学

LD score regression

LD四大功能:estimating LD Scores, h2 and partitioned h2, genetic correlation, the LD Score regression intercept。

计算LD分数、性状的遗传度、性状间的遗传相关性及遗传协方差,分割遗传度,细胞类型特异性分析等

Linkage disequilibrium score regression(LDSC)

https://www.nature.com/articles/ng.3211

  1. 判断GWAS结果中是否存在混淆因素。
  2. 评估遗传力大小。
    计算两种性状的遗传相关性。