Papers on epigenetics and statistics

Review papers:

Introduction to epigenomics and epigenome-wide analysis (Geir)

MJ Fazzari, JM Greally - Statistical Methods in Molecular Biology, 2010 - Springer

  • Good overview of epigenetics
  • applications towards human cancer
  • overview of (simple?) statistical approaches and challenges


Analysis of complex methylation data (Geir)

KD Siegmund, PW Laird - Methods, 2002 - Elsevier
  • Good bakcground on epigenetics
  • Good cover of simple statistical approaches

MAGI: Methylation analysis using genome information (Geir)

DD Baumann, RW Doerge - Epigenetics, 2014 - Taylor & Francis

Test for differences in methylation between two groups (or individuals). Genome information used for defining homogeneous regions. Simple (exact Fisher tests) used within each region. No "temporal/spatial" correlation incorporated (although indirectly it is through the definitions of the regions)

Integrating Prior Knowledge in Multiple Testing under Dependence with Applications to Detecting Differential DNA Methylation(Aliaksandr)

Pei Fen Kuan1,* and Derek Y. Chiang2 - Biometrics Volume 68, Issue 3, pages 774–783, September 2012

  • Introduction is informative in terms of applications of methylation studies
  • FDRs, FNRs, ATPs, AUROCs are compared for quite some different methods on different non null proportions and for different means of signal, basing on 1-HMM(1-NHMM) processes for θ (binary variable for presence of methylation at the loci) and different assumptions regarding Z of methylized LOCIs are made
  • The procedures, however, are not always formally described in the article. A thorough description is available in the web appendix.
  • Graphs in figure 1 show presence of several lags of autocorrelation of t statistics, though only 1-HMM(1-NHMM) for θ and 3-AR for the underlying Z (as an option for Z modelling in 4.2.3) processes are addressed as assumptions. Might be interesting to also try modelling Z by more general methods (ARIMA(n,p,q)) in 4.2.3 and also try modelling θ by N-HMM (N-NHMM), etc. Might also be interesting to let θ have more states for different levels of methylation (θ belongs {0,1,...,N} with respect to the corresponding means (instead of separately testing for different fixed means of the signal) of the level of methylation tested upon).
  • Anyway the article gives a very good overview of the procedures for multiple testing of methylation presence, their efficiency and the way they are applied.


J. L. Larson, Harvard University, 2012
  • fully described HMM model for 3 levels of chromatin domains of 5 types
  • statements of biological questions and some answers to them
  • inference on HMM and parameter estimation procedures are described briefly
  • priors on the parameters based on the clustering results seem to be benefitial
  • THMM applied too for prediction of chromatin domains, though sub states are not described in full detail
Epigenetic detection and pattern recognition via Bayesian hierarchical hidden Markov models (Aliaksandr)

Xinlei Wang, Miao Zang, Guanghua Xiao

  • Experiment is designed on both cocaine-treated and saline treated mice to detect cocaine induced alterations in the TF binding (IP enrichment)
  • Bayesian hierarchical model is developed, the model detects spatially dependent epigenetic changes.
  • The hidden variable belongs to 4 states {0, 1, 2, 3} and shows if the mice is IP enriched under saline or cocaine or neither or both
  • Two datasets are observed (IP enrichment under saline and cocaine treatment reads) which are conditionally independent wrt each other with conditionally independent observations
  • Priors are set for the parameters of the model and MCMC algorithm (Gibbs sampler) is adopted to draw from the joint posterior
  • Inference is made on the marginal posteriors of the parameters of interest (difference between the IP enrichment changes under saline and cocaine treatment of mice)
Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data (Aliaksandr)

Hokeun Sun, Shuang Wang

  • Initially paired t-test with Bonferoni adjustment was addressed, however it did not consider correlations between the bases and prior biological knowledge
  • To deal with this the penalized logistic regression is addressed (penalties in the likelihood) such that the coefficients between the data can be shrinked towards each other allowing them to exchange information. This allows to adjust the parameters in a way to improve smoothness and sparsity of the model
  • Coordinate descent algorithm is addressed for maximizing the penalized likelihood in this application
Discovering and mapping chromatin states using a tree hidden Markov model(Aliaksandr)

Jacob Biesinger, Yuanfeng Wang, Xiaohui Xie

  • Hidden state variables represent chromatin states in a specific cell type are interconnected (tree structure for each element of the chain) 1-Markov chain
  • Observed data are chromatin modification of the cell of a certain type  (depend only on it) are conditionally i.i.d.
  • Parameters are estimated by means of ML var. EM optimization is addressed for maximizing the likelihood function
  • This EXAMPLE is quite relevant for our research!!!
Epigenetic change detection and pattern recognition via Bayesian hierarchical hidden Markov models (Aliaksandr)
Xinlei Wang, Miao Zang, and Guanghua Xiao - Statistics in Medicine, 2012
  • Nice introduction into the biological problem with easy to understand biological description
  • Good problem statement from the statistical perspective
  • An indeed elegant model is suggested, though the very basic assumptions it is based on are not thoroughly discussed upon (order of Markov chain, assumed distributions of the modelled parameters, etc.)
  • Nice experiments' design


Epigenetics and plant genome evolution (Geir)

CM Diez, K Roessler, BS Gaut - Current opinion in plant biology, 2014 - Elsevier
  • Discuss patterns and relations to underlying structures


Published Aug. 8, 2014 8:49 AM - Last modified Nov. 24, 2017 10:39 AM