Disputas: Aliaksandr Hubin
M.Sc. Aliaksandr Hubin ved Matematisk institutt vil forsvare sin avhandling for graden ph.d.:
Bayesian model configuration, selection and averaging in complex regression contexts
Tid og sted for prøveforelesning
- Reader Leonardo Bottolo, University of Cambridge
- Professor Jo Eidsvik, NTNU
- Professor Ørnulf Borgan, Universitetet i Oslo
Leder av disputas
Instituttleder Geir Dahl, Matematisk institutt, Universitet i Oslo
- Professor Geir Olve Storvik, Matematisk institutt, Universitet i Oslo
- Professor Ole Christian Lingjærde, Institutt for informatikk, Universitet i Oslo
- Professor Paul Grini, Institutt for biovitenskap, Universitetet i Oslo
- Førsteamanuensis Melinka Butenko, Institutt for biovitenskap, Universitetet i Oslo
In this PhD thesis problems of Bayesian model selection and model averaging are addressed in various regression contexts. The approaches developed within the thesis are based on the idea of marginalizing out parameters from the likelihood. This allows to work on the marginal space of models, which simplifies the search algorithms significantly. For the linear models an efficient mode jumping Monte Carlo Markov chain (MJMCMC) algorithm was suggested. The approach performed very well on simulated and real data. Further, the algorithm was extended to work with logic regressions, where one has a feature space consisting of various complicated logical expressions, which makes enumeration of all features computationally and memory infeasible in most of the cases. The genetically modified MJMCMC (GMJMCMC) algorithm was suggested to tackle this issue. The algorithm combines the idea of keeping and updating the populations of highly predictive logical expressions combined with MJMCMC for the efficient exploration of the model space. Several simulation and real data studies show that logical expressions of high orders can be recovered with large power and low false discovery rate. Moreover, the GMJMCMC approach is adapted to make inference within the class of deep Bayesian regression models (which is a suggested in the thesis extension of various machine and statistical learning models like artificial neural networks, classification and regression trees, logic regressions and linear models). The reversible GMJMCMC, named RGMJMCMC, is also suggested. It makes transitions between the populations of variables in a way that satisfies the detailed balance equation. Based on several examples, it is shown that the DBRM approach can be efficient for both inference and prediction in various applications. In particular, two ground physical laws (planetary mass law and third Kepler’s law) were recovered from the data with large power and low false discovery rate. Three classification examples were also studied, where the comparison to other popular machine and statistical learning approaches was performed. Finally, a thorough study comparing different Bayesian approaches to genome wide association was done. It was shown that the developed in this thesis approaches can be efficiently applied to data with a huge number of covariates.
For mer informasjon
Kontakt Matematisk institutt.