Adaptive penalisation for p>>n sparse problems (completed)
About the project
High dimensional regression modelling, where the number of covariates p is much larger than the number of samples n, is an active research area in statistics, much due to the explosion of such data sets coming especially from genomics and epigenetics. Several popular and successful penalised regression methods are available, though in terms of variable selection they are not stable and suffer from lack of robustness and overfitting. In this project we develop new, more robust variants of adaptive penalised methods. Our approach is to strengthen and guide the variable selection procedure in three directions (i) data integration; (ii) semi parametric approaches (iii) pre-selection bias correction. The new methods are tested on high throughput genomic and epigenetics data (like gene expressions, copy numbers, SNPs, methylations, etc.) with our collaborators in cancer research, in order to identify new genes and gene-environment interactions that play a role in the progression and therapy of ovarian and cervix cancer.
This three-year research project in statistical methodology with applications to genomics is focusing on research challenges originating in the availability of a huge number of explanatory variables measured on relatively few subjects. The aim is to identify the few factors that actually influence the outcome (time to event, disease status, etc), reducing the number of false positives, which appear associated with the outcome just by chance.
The objectives of this research project are to develop new statistical methodology for p >> n sparse problems with focus on data integration, pre-selection bias and semi-parametric modelling, and to apply these new methods to cancer molecular biology, including in particular cervix cancer and ovarian cancer. Borrowing strength across data sets and using adaptive methods will lead to new understanding from genomic data.
Collaboration with Heidi Lyng, Dept. of Radiation Biology, Norwegian Radium Hospital/Oslo University Hospital, Sylvia Richardson, BSU-MRC, Cambridge, UK, and Mark van de Wiel, Dep. of Epidemiology & Biostatistics and Dep. of Mathematics, Vreije Universiteit, Amsterdam, Netherlands.
Master students: Marianne Røine (2013), Kristina Haarr Vinvand (2014)
Project number 204664/V30 of the Research Council of Norway (RCN).