edgeR
edgeR: empirical analysis of DGE in R
- An overdispersed Poisson model is used to account for both biological and technical variability.
- Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
- The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated.
Why EdgeR
- For microarrays, the abundance of a particular transcript is measured as afluorescence intensity, effectively a continuous response
- Digital gene expression (DGE) data the abundance is observed as a count
- Therefore, procedures that are successful for microarray data are not directly applicable to DGE data
- . edgeR is designed for the analysis of replicated count-based expression data and is an implementation of methology developed by Robinson and Smyth[1][2].
- It initially developed for serial analysis of gene expression (SAGE)
As a result, edgeR may also be useful in other experiments that generate counts, such as ChIP-seq, in proteomics experiments where spectral counts are used to summarize the peptide abundance[3] or in barcoding experiments where several species are counted [4].
Digital gene expression: Digital gene expression (DGE) is a sequence-based approach for gene expression analyses, that generates a digital output at an unparalleled level of sensitivity[5].
Serial analysis of gene expression (SAGE): Serial analysis of gene expression, or SAGE, is an experimental technique designed to gain a direct and quantitative measure of gene expression. The SAGE method is based on the isolation of unique sequence tags (9-10 bp in length) from individual mRNAs and concatenation of tags serially into long DNA molecules for a lump-sum sequencing[6].
Spam test
Spam test2
Method
In limma (Smyth, 2004), where an empirical Bayes model is used to moderate the probe-wise variances.
In edgeR:
We assume the data can be summarized into a table of counts
We model the data as negative binomial (NB) distributed
$$
Y_ {gi} \sim NB(M_ i p_ {gj},\phi_g)
$$
For gene $_ g$ and sample $_ i$:
$M_i$: the library size (total number of reads),
$ϕ_g$: the dispersion
$p _{gj}$: is the relative abundance of gene $_g$ in experimental group $_j$ to which sample $_i$ belongs.
We use the NB parameterization where:
- the mean is $\mu_ {gi} = M_ i p_ {gj}$
- the variance is $μ_ {gi}(1+ \mu _ {gi} \phi _g)$
For differential expression analysis:
- the parameters of interest are $p_ {gj}$.
The NB distribution is reduced to Poisson when $ \phi_g = 0$.
In some DGE applications, technical variation can be treated as Poisson.
In general, $\phi_g$ represents the coefficient of variation of biological variation between the samples. In this way, our model is able to separate biological from technical variation.
limma
: dispersion estimates -> topTags
: tabulate the top differentially expressed genes
-> plotSmear
: MA plot
More
There are a few terms and algorithms I do not understand. So, I’ll update this page later.
Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance, Bioinformatics, 2007, vol. 23 (pg. 2881-2887) ↩︎
[Robinson MD, Smyth GK. Small sample estimation of negative binomial dispersion, with applications to SAGE data, Biostatistics, 2008, vol. 9 (pg. 321-332)] ↩︎
Andersson AF, et al. Comparative analysis of human gut microbiota by barcoded pyrosequencing, PLoS ONE, 2008, vol. 3 pg. e2836 ↩︎
Wong JWH, et al. Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments, Brief. Bioinform., 2008, vol. 9 (pg. 156-165) ↩︎
Rodríguez-Esteban, G., González-Sastre, A., Rojo-Laguna, J.I. et al. Digital gene expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea . BMC Genomics 16, 361 (2015). https://doi.org/10.1186/s12864-015-1533-1 ↩︎
Yamamoto M, Wakatsuki T, Hada A, Ryo A. Use of serial analysis of gene expression (SAGE) technology. J Immunol Methods. 2001 Apr;250(1-2):45-66. doi: 10.1016/s0022-1759(01)00305-2. PMID: 11251221. ↩︎