Disputas: Aliaksandr Hubin

M.Sc. Aliaksandr Hubin ved Matematisk institutt vil forsvare sin avhandling for graden ph.d.:

Bayesian model configuration, selection and averaging in complex regression contexts

Aliaksandr Hubin

Tid og sted for prøveforelesning

09. november 2018 kl. 10.15, Aud. 4 Vilhelm Bjerknes' hus.

Bedømmelseskomité

  • Reader Leonardo Bottolo, University of Cambridge
  • Professor Jo Eidsvik, NTNU
  • Professor Ørnulf Borgan, Universitetet i Oslo

Leder av disputas

Instituttleder Geir Dahl, Matematisk institutt, Universitet i Oslo

Veiledere

Sammendrag

In this PhD thesis problems of Bayesian model selection and model averaging are addressed in various regression contexts. The approaches developed within the thesis are based on the idea of marginalizing out parameters from the likelihood. This allows to work on the marginal space of models, which simplifies the search algorithms significantly. For the linear models an efficient mode jumping Monte Carlo Markov chain (MJMCMC) algorithm was suggested. The approach performed very well on simulated and real data. Further, the algorithm was extended to work with logic regressions, where one has a feature space consisting of various complicated logical expressions, which makes enumeration of all features computationally and memory infeasible in most of the cases. The genetically modified MJMCMC (GMJMCMC) algorithm was suggested to tackle this issue. The algorithm combines the idea of keeping and updating the populations of highly predictive logical expressions combined with MJMCMC for the efficient exploration of the model space. Several simulation and real data studies show that logical expressions of high orders can be recovered with large power and low false discovery rate. Moreover, the GMJMCMC approach is adapted to make inference within the class of deep Bayesian regression models (which is a suggested in the thesis extension of various machine and statistical learning models like artificial neural networks, classification and regression trees, logic regressions and linear models). The reversible GMJMCMC, named RGMJMCMC, is also suggested. It makes transitions between the populations of variables in a way that satisfies the detailed balance equation. Based on several examples, it is shown that the DBRM approach can be efficient for both inference and prediction in various applications. In particular, two ground physical laws (planetary mass law and third Kepler’s law) were recovered from the data with large power and low false discovery rate. Three classification examples were also studied, where the comparison to other popular machine and statistical learning approaches was performed. Finally, a thorough study comparing different Bayesian approaches to genome wide association was done. It was shown that the developed in this thesis approaches can be efficiently applied to data with a huge number of covariates.

For mer informasjon

Kontakt Matematisk institutt.

Publisert 26. okt. 2018 08:59 - Sist endret 26. okt. 2018 10:48