Papers on epigenetics and statistics
Review papers:
Introduction to epigenomics and epigenomewide analysis (Geir)
MJ Fazzari, JM Greally  Statistical Methods in Molecular Biology, 2010  Springer
 Good overview of epigenetics
 applications towards human cancer
 overview of (simple?) statistical approaches and challenges
Analysis of complex methylation data (Geir)
KD Siegmund, PW Laird  Methods, 2002  Elsevier
 Good bakcground on epigenetics
 Good cover of simple statistical approaches
MAGI: Methylation analysis using genome information (Geir)
DD Baumann, RW Doerge  Epigenetics, 2014  Taylor & Francis
Test for differences in methylation between two groups (or individuals). Genome information used for defining homogeneous regions. Simple (exact Fisher tests) used within each region. No "temporal/spatial" correlation incorporated (although indirectly it is through the definitions of the regions)
Integrating Prior Knowledge in Multiple Testing under Dependence with Applications to Detecting Differential DNA Methylation(Aliaksandr)
Pei Fen Kuan^{1,*} and Derek Y. Chiang^{2}  Biometrics Volume 68, Issue 3, pages 774–783, September 2012
 Introduction is informative in terms of applications of methylation studies
 FDRs, FNRs, ATPs, AUROCs are compared for quite some different methods on different non null proportions and for different means of signal, basing on 1HMM(1NHMM) processes for θ (binary variable for presence of methylation at the loci) and different assumptions regarding Z of methylized LOCIs are made
 The procedures, however, are not always formally described in the article. A thorough description is available in the web appendix.
 Graphs in figure 1 show presence of several lags of autocorrelation of t statistics, though only 1HMM(1NHMM) for θ and 3AR for the underlying Z (as an option for Z modelling in 4.2.3) processes are addressed as assumptions. Might be interesting to also try modelling Z by more general methods (ARIMA(n,p,q)) in 4.2.3 and also try modelling θ by NHMM (NNHMM), etc. Might also be interesting to let θ have more states for different levels of methylation (θ belongs {0,1,...,N} with respect to the corresponding means (instead of separately testing for different fixed means of the signal) of the level of methylation tested upon).
 Anyway the article gives a very good overview of the procedures for multiple testing of methylation presence, their efficiency and the way they are applied.
J. L. Larson, Harvard University, 2012
 fully described HMM model for 3 levels of chromatin domains of 5 types
 statements of biological questions and some answers to them
 inference on HMM and parameter estimation procedures are described briefly
 priors on the parameters based on the clustering results seem to be benefitial
 THMM applied too for prediction of chromatin domains, though sub states are not described in full detail
Epigenetic detection and pattern recognition via Bayesian hierarchical hidden Markov models (Aliaksandr)
Xinlei Wang, Miao Zang, Guanghua Xiao
 Experiment is designed on both cocainetreated and saline treated mice to detect cocaine induced alterations in the TF binding (IP enrichment)
 Bayesian hierarchical model is developed, the model detects spatially dependent epigenetic changes.
 The hidden variable belongs to 4 states {0, 1, 2, 3} and shows if the mice is IP enriched under saline or cocaine or neither or both
 Two datasets are observed (IP enrichment under saline and cocaine treatment reads) which are conditionally independent wrt each other with conditionally independent observations
 Priors are set for the parameters of the model and MCMC algorithm (Gibbs sampler) is adopted to draw from the joint posterior
 Inference is made on the marginal posteriors of the parameters of interest (difference between the IP enrichment changes under saline and cocaine treatment of mice)
Networkbased regularization for matched casecontrol analysis of highdimensional DNA methylation data (Aliaksandr)
Hokeun Sun, Shuang Wang

Initially paired ttest with Bonferoni adjustment was addressed, however it did not consider correlations between the bases and prior biological knowledge

To deal with this the penalized logistic regression is addressed (penalties in the likelihood) such that the coefficients between the data can be shrinked towards each other allowing them to exchange information. This allows to adjust the parameters in a way to improve smoothness and sparsity of the model

Coordinate descent algorithm is addressed for maximizing the penalized likelihood in this application
Discovering and mapping chromatin states using a tree hidden Markov model(Aliaksandr)
Jacob Biesinger, Yuanfeng Wang, Xiaohui Xie

Hidden state variables represent chromatin states in a specific cell type are interconnected (tree structure for each element of the chain) 1Markov chain

Observed data are chromatin modification of the cell of a certain type (depend only on it) are conditionally i.i.d.

Parameters are estimated by means of ML var. EM optimization is addressed for maximizing the likelihood function

This EXAMPLE is quite relevant for our research!!!
Epigenetic change detection and pattern recognition via Bayesian hierarchical hidden Markov models (Aliaksandr)
Xinlei Wang, Miao Zang, and Guanghua Xiao  Statistics in Medicine, 2012

Nice introduction into the biological problem with easy to understand biological description

Good problem statement from the statistical perspective

An indeed elegant model is suggested, though the very basic assumptions it is based on are not thoroughly discussed upon (order of Markov chain, assumed distributions of the modelled parameters, etc.)

Nice experiments' design
Plants:
Epigenetics and plant genome evolution (Geir)
CM Diez, K Roessler, BS Gaut  Current opinion in plant biology, 2014  Elsevier
 Discuss patterns and relations to underlying structures
Published Aug. 8, 2014 8:49 AM
 Last modified Nov. 24, 2017 10:39 AM