Research interests: Data Science and Computational statistics, Bayesian hierarchial modelling, biological applications, Bayesian Machine Learning
Emneord:
Statistikk,
Statistikk og biostatistikk
Publikasjoner
-
Hubin, Aliaksandr; Storvik, Geir & Frommlet, Florian
(2021).
Flexible Bayesian Nonlinear Model Configuration.
The journal of artificial intelligence research.
ISSN 1076-9757.
72,
s. 901–942.
doi:
10.1613/JAIR.1.13047.
Fulltekst i vitenarkiv
Vis sammendrag
Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through flexible approaches such as neural networks, but this results in less interpretable models and potential overfitting. Alternatively, specific parametric nonlinear functions can be used, but the specification of such functions is in general complicated. In this paper, we introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models. Nonlinear features are generated hierarchically, similarly to deep learning, but have additional flexibility on the possible types of features to be considered. This flexibility, combined with variable selection, allows us to find a small set of important features and thereby more interpretable models. Within the space of possible functions, a Bayesian approach, introducing priors for functions based on their complexity, is considered. A genetically modified mode jumping Markov chain Monte Carlo algorithm is adopted to perform Bayesian inference and estimate posterior probabilities for model averaging. In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms.
-
Hubin, Aliaksandr; Frommlet, Florian & Storvik, Geir Olve
(2021).
Reversible genetically modified mode jumping MCMC.
I Makridis, Andreas; Milienos, Fotios; Papastamoulis, Panagiotis; Parpoula, Christina & Rakitzis, Athanasios (Red.),
22nd European Young Statisticians Meeting – Proceedings.
Department of Psychology & Department of Sociology, School of Social Science, Panteion University of Social and Political Sciences.
ISSN 978-960-7943-23-1.
s. 35–40.
-
Gramuglia, Emanuele; Storvik, Geir Olve & Stakkeland, Morten
(2021).
Clustering and automatic labelling within time series of categorical observations - with an application to marine log messages.
The Journal of the Royal Statistical Society, Series C (Applied Statistics).
ISSN 0035-9254.
70(3),
s. 714–732.
doi:
10.1111/rssc.12483.
Fulltekst i vitenarkiv
Vis sammendrag
System logs or log files containing textual messages with associated time stamps are generated by many technologies and systems. The clustering technique proposed in this paper provides a tool to discover and identify patterns or macrolevel events in this data. The motivating application is logs generated by frequency converters in the propulsion system on a ship, while the general setting is fault identification and classification in complex industrial systems. The paper introduces an offline approach for dividing a time series of log messages into a series of discrete segments of random lengths. These segments are clustered into a limited set of states. A state is assumed to correspond to a specific operation or condition of the system, and can be a fault mode or a normal operation. Each of the states can be associated with a specific, limited set of messages, where messages appear in a random or semi‐structured order within the segments. These structures are in general not defined a priori. We propose a Bayesian hierarchical model where the states are characterised both by the temporal frequency and the type of messages within each segment. An algorithm for inference based on reversible jump MCMC is proposed. The performance of the method is assessed by both simulations and operational data.
-
Hubin, Aliaksandr; Storvik, Geir Olve; Grini, Paul Eivind & Butenko, Melinka Alonso
(2020).
A Bayesian binomial regression model with latent gaussian processes for modelling DNA methylation.
Austrian Journal of Statistics.
ISSN 1026-597X.
49(4),
s. 46–56.
doi:
10.17713/ajs.v49i4.1124.
Fulltekst i vitenarkiv
Vis sammendrag
Epigenetic observations are represented by the total number of reads from a given pool of cells and the number of methylated reads, making it reasonable to model this data by a binomial distribution. There are numerous factors that can influence the probability of success in a particular region. Moreover, there is a strong spatial (alongside the genome) dependence of these probabilities. We incorporate dependence on the covariates and the spatial dependence of the methylation probability for observations from a pool of cells by means of a binomial regression model with a latent Gaussian field and a logit link function. We apply a Bayesian approach including prior specifications on model configurations. We run a mode jumping Markov chain Monte Carlo algorithm (MJMCMC) across different choices of covariates in order to obtain the joint posterior distribution of parameters and models. This also allows finding the best set of covariates to model methylation probability within the genomic region of interest and individual marginal inclusion probabilities of the covariates.
-
Se alle arbeider i Cristin
-
Hubin, Aliaksandr; Frommlet, Florian & Storvik, Geir Olve
(2021).
Reversible Genetically Modified MCMCs.
Vis sammendrag
In this work, we introduce a reversible version of a genetically modified Markov chain Monte Carlo algorithm (GMJMCMC) for inference on posterior model probabilities in complex functional spaces, where the number of explanatory variables or functions of explanatory variables is prohibitively large for simple Markov Chain Monte Carlo methods. A genetically modified Markov chain Monte Carlo algorithm (GMJMCMC) was introduced in [5, 4, 2] for Bayesian model selection/averaging problems when the total number of function of covariates is prohibitively large.
More specifically, these applications include GWAS studies with Bayesian generalized linear models [2] as well as Bayesian logic regressions [5] and Bayesian generalized nonlinear models [4]. If its regularity conditions are met, GMJMCMC algorithm can asymptotically explore all models in the defined model spaces. At the same time, GMJMCMC is not a proper MCMC in a sense that its limiting distribution does not correspond to marginal posterior model probabilities and thus only renormalized estimates of these probabilities [3, 1] can be obtained. Unlike the standard GMJMCMC algorithm, the introduced algorithm is a proper MCMC and its limiting distribution corresponds to posterior marginal model probabilities in the explored model spaces under reasonable regularity conditions.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2021).
Variational Inference for Bayesian Neural Networks under Model and Parameter Uncertainty.
Vis sammendrag
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using a Bayesian approach: Parameter and prediction uncertainties become easily available, facilitating rigorous statistical analysis. Furthermore, prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In this paper we introduce the concept of model uncertainty in BNNs and hence make inference in the joint space of models and parameters.
Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Experimental results on a range of benchmark data sets show that we obtain comparable accuracy results with the competing models, but based on methods that are much more sparse than ordinary BNNs. This is particularly the case in model selection settings, but also within a Bayesian model averaging setting a considerable sparsification is achieved. As expected, model uncertainties give higher, but more reliable uncertainty measures.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2021).
Variational Bayes for inference on model and parameter
uncertainty in Bayesian neural networks.
Vis sammendrag
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in
the deep learning community due to the development of scalable approximate Bayesian inference
techniques [1]. There are several advantages of using a Bayesian approach: Parameter and prediction uncertainties become easily available, facilitating rigorous statistical analysis. Furthermore,
prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In the presented piece of
research [2] we introduce the concept of model uncertainty in BNNs and hence make inference in
the joint space of models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate
the model space constraints. Experimental results on a range of benchmark data sets show that
we obtain comparable accuracy results with the competing models, but based on methods that are
much more sparse than ordinary BNNs. This is particularly the case in model selection settings,
but also within a Bayesian model averaging setting a considerable sparsification is achieved. As
expected, model uncertainties give higher, but more reliable uncertainty measures.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2020).
A novel algorithmic approach to Bayesian Logic Regression.
Vis sammendrag
Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association mapping. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL (quantitative trait locus) mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2020).
Rejoinder for the discussion of the paper "A Novel Algorithmic Approach to Bayesian Logic Regression".
Bayesian Analysis.
ISSN 1936-0975.
15(1),
s. 312–333.
doi:
10.1214/18-ba1141.
Fulltekst i vitenarkiv
-
Storvik, Geir Olve
(2020).
"Preliminaries for Deep Neural Networks: Recapture of Linear Algebra, Gradient Descents and Generalized Linear Models".
-
Storvik, Geir Olve
(2020).
Neural networks vs generalized linear models.
-
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2019).
Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
Vis sammendrag
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using Bayesian approach: Parameter and prediction uncertainty become easily available, facilitating rigid statistical analysis. Furthermore, prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In this paper we introduce the concept of model uncertainty in BNNs and hence make inference in the joint space of models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Finally, we show that incorporating model uncertainty via Bayesian model averaging and Bayesian model selection allows to drastically sparsify the structure of BNNs without significant loss of predictive power.
-
Hubin, Aliaksandr; Storvik, Geir Olve; Grini, Paul Eivind & Butenko, Melinka Alonso
(2019).
Bayesian binomial regression model with a latent Gaussian field for analysis of epigenetic data.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2019).
Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
-
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2019).
Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
-
Storvik, Geir Olve & Hubin, Aliaksandr
(2019).
Combining model and parameter uncertainty in Bayesian neural networks.
-
Storvik, Geir Olve
(2019).
Flexible Bayesian Nonlinear Model Configuration.
Se alle arbeider i Cristin
Publisert 13. nov. 2010 14:06
- Sist endret 6. aug. 2020 10:16