Academic interests
Statistics, Artificial Intelligence, Econometrics, Machine Learning, Operations Research
Research profiles
Google Scholar
Researchgate
LinkedIn
Courses Taught
STK2130 - Modeling by Stochastic Processes (plenary sessions and exercises)
STK3100 - Introduction to generalized linear models (exercises)
STK4900 - Statistical methods and applications (plenary sessions and exercises)
Academic Background
University of Oslo, Oslo, Norway — PhD
August 2014 - August 2018
Faculty of Mathematics and Natural Sciences
Specialty: Statistics
Dissertation: "Bayesian model configuration, selection and averaging in complex regression contexts".
Molde University College - Specialized University in Logistics, Molde, Norway — Master of Science
August 2012 - June 2014
Faculty of Economics, Informatics and Social Research.
Specialty: Operations Research
Research Thesis: "Evaluation of Supply Vessel schedules robustness with a posteriori improvements".
Belarusian State University, Minsk Belarus — Specialist
September 2008 - June 2013
Faculty of Applied Mathematics and Computer Science. Department of Mathematical Modelling and Data Analys
Specialty: Economic Cybernetics (mathematical methods and computer based modeling in economy).
Research Thesis: "Methods and tools of investment management in conditions of international diversifications"
Awards
Project
Bayesian model selection: Bayesian model selection.
Positions held
Norwegian Computing Center, Oslo, Norway — Research scientist/Senior research scientist
September 2018 - December 2020
Fundamental research in statistics and machine learning, publishing articles and working on projects involving development of customized statistical and machine learning methodology in various applications for private and public sectors, acting as a reviewer in several highly ranked journals including the Scandinavian Journal of Statistics, Journal of the American Statistical Association and Scientific Reports and conferences including ACL and EMNLP.
University of Oslo, Oslo, Norway —PhD candidate
August 2014 - August 2018
Bayesian variable selection and model averaging. Bayesian deep feature engineering. Applied research with Genetic and Epigenetic data (GWAS, EWAS, QTL mapping, etc.).
Compatibl, Minsk, Belarus —Business analyst
September 2011 - June 2012
Calculation of CVA and regulatory capital as well as full support, implementation and customisation services within Analyst project. Compatibl's customers included some of the largest and most respected banks and hedge funds worldwide, including 4 dealers, 3 supranationals, over 20 central banks, and 3 major financial technology vendors.
Tags:
Bayesian Statistics,
Model selection,
Probabilistic Machine Learning,
Operations Research
Publications
-
Gåsemyr, Jørund Inge & Hubin, Aliaksandr
(2022).
Prior distributions expressing ignorance about convex increasing failure rates.
Scandinavian Journal of Statistics.
ISSN 0303-6898.
doi:
10.1111/sjos.12588.
-
Hubin, Aliaksandr; Storvik, Geir & Frommlet, Florian
(2021).
Flexible Bayesian Nonlinear Model Configuration.
The journal of artificial intelligence research.
ISSN 1076-9757.
72,
p. 901–942.
doi:
10.1613/JAIR.1.13047.
Full text in Research Archive
Show summary
Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through flexible approaches such as neural networks, but this results in less interpretable models and potential overfitting. Alternatively, specific parametric nonlinear functions can be used, but the specification of such functions is in general complicated. In this paper, we introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models. Nonlinear features are generated hierarchically, similarly to deep learning, but have additional flexibility on the possible types of features to be considered. This flexibility, combined with variable selection, allows us to find a small set of important features and thereby more interpretable models. Within the space of possible functions, a Bayesian approach, introducing priors for functions based on their complexity, is considered. A genetically modified mode jumping Markov chain Monte Carlo algorithm is adopted to perform Bayesian inference and estimate posterior probabilities for model averaging. In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms.
-
Hubin, Aliaksandr; Frommlet, Florian & Storvik, Geir Olve
(2021).
Reversible genetically modified mode jumping MCMC.
In Makridis, Andreas; Milienos, Fotios; Papastamoulis, Panagiotis; Parpoula, Christina & Rakitzis, Athanasios (Ed.),
22nd European Young Statisticians Meeting – Proceedings.
Department of Psychology & Department of Sociology, School of Social Science, Panteion University of Social and Political Sciences.
ISSN 978-960-7943-23-1.
p. 35–40.
-
-
Lison, Pierre; Barnes, Jeremy; Hubin, Aliaksandr & Touileb, Samia
(2020).
Named Entity Recognition without Labelled Data: A Weak Supervision Approach .
In Jurafsky, Dan; Chai, Joyce; Schluter, Natalie & Tetreault, Joel (Ed.),
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
Association for Computational Linguistics.
ISSN 978-1-952148-25-5.
p. 1518–1533.
Full text in Research Archive
-
Hubin, Aliaksandr; Storvik, Geir Olve; Grini, Paul Eivind & Butenko, Melinka Alonso
(2020).
A Bayesian binomial regression model with latent gaussian processes for modelling DNA methylation.
Austrian Journal of Statistics.
ISSN 1026-597X.
49(4),
p. 46–56.
doi:
10.17713/ajs.v49i4.1124.
Full text in Research Archive
Show summary
Epigenetic observations are represented by the total number of reads from a given pool of cells and the number of methylated reads, making it reasonable to model this data by a binomial distribution. There are numerous factors that can influence the probability of success in a particular region. Moreover, there is a strong spatial (alongside the genome) dependence of these probabilities. We incorporate dependence on the covariates and the spatial dependence of the methylation probability for observations from a pool of cells by means of a binomial regression model with a latent Gaussian field and a logit link function. We apply a Bayesian approach including prior specifications on model configurations. We run a mode jumping Markov chain Monte Carlo algorithm (MJMCMC) across different choices of covariates in order to obtain the joint posterior distribution of parameters and models. This also allows finding the best set of covariates to model methylation probability within the genomic region of interest and individual marginal inclusion probabilities of the covariates.
-
-
Hubin, Aliaksandr
(2019).
An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models,
ACM International Conference Proceeding Series (ICPS): AIIPCC '19: Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing.
Association for Computing Machinery (ACM).
ISSN 978-1-4503-7633-4.
p. 1–9.
doi:
10.1145/3371425.3371641.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2018).
A Novel Algorithmic Approach to Bayesian Logic Regression.
Bayesian Analysis.
ISSN 1936-0975.
15(1),
p. 263–311.
doi:
10.1214/18-BA1141.
Full text in Research Archive
-
-
View all works in Cristin
-
Hubin, Aliaksandr; Frommlet, Florian & Storvik, Geir Olve
(2021).
Reversible Genetically Modified MCMCs.
Show summary
In this work, we introduce a reversible version of a genetically modified Markov chain Monte Carlo algorithm (GMJMCMC) for inference on posterior model probabilities in complex functional spaces, where the number of explanatory variables or functions of explanatory variables is prohibitively large for simple Markov Chain Monte Carlo methods. A genetically modified Markov chain Monte Carlo algorithm (GMJMCMC) was introduced in [5, 4, 2] for Bayesian model selection/averaging problems when the total number of function of covariates is prohibitively large.
More specifically, these applications include GWAS studies with Bayesian generalized linear models [2] as well as Bayesian logic regressions [5] and Bayesian generalized nonlinear models [4]. If its regularity conditions are met, GMJMCMC algorithm can asymptotically explore all models in the defined model spaces. At the same time, GMJMCMC is not a proper MCMC in a sense that its limiting distribution does not correspond to marginal posterior model probabilities and thus only renormalized estimates of these probabilities [3, 1] can be obtained. Unlike the standard GMJMCMC algorithm, the introduced algorithm is a proper MCMC and its limiting distribution corresponds to posterior marginal model probabilities in the explored model spaces under reasonable regularity conditions.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2021).
Variational Inference for Bayesian Neural Networks under Model and Parameter Uncertainty.
Show summary
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using a Bayesian approach: Parameter and prediction uncertainties become easily available, facilitating rigorous statistical analysis. Furthermore, prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In this paper we introduce the concept of model uncertainty in BNNs and hence make inference in the joint space of models and parameters.
Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Experimental results on a range of benchmark data sets show that we obtain comparable accuracy results with the competing models, but based on methods that are much more sparse than ordinary BNNs. This is particularly the case in model selection settings, but also within a Bayesian model averaging setting a considerable sparsification is achieved. As expected, model uncertainties give higher, but more reliable uncertainty measures.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2021).
Variational Bayes for inference on model and parameter
uncertainty in Bayesian neural networks.
Show summary
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in
the deep learning community due to the development of scalable approximate Bayesian inference
techniques [1]. There are several advantages of using a Bayesian approach: Parameter and prediction uncertainties become easily available, facilitating rigorous statistical analysis. Furthermore,
prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In the presented piece of
research [2] we introduce the concept of model uncertainty in BNNs and hence make inference in
the joint space of models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate
the model space constraints. Experimental results on a range of benchmark data sets show that
we obtain comparable accuracy results with the competing models, but based on methods that are
much more sparse than ordinary BNNs. This is particularly the case in model selection settings,
but also within a Bayesian model averaging setting a considerable sparsification is achieved. As
expected, model uncertainties give higher, but more reliable uncertainty measures.
-
Lison, Pierre; Barnes, Jeremy Claude & Hubin, Aliaksandr
(2021).
skweak: weak supervision made easy for NLP.
Show summary
We present skweak, a versatile, Python-based software toolkit enabling NLP developers to apply weak supervision to a wide range of NLP tasks. Weak supervision is an emerging machine learning paradigm based on a simple idea: instead of labelling data points by hand, we use labelling functions derived from domain knowledge to automatically obtain annotations for a given dataset. The resulting labels are then aggregated with a generative model that estimates the accuracy (and possible confusions) of each labelling function. The skweak toolkit makes it easy to implement a large spectrum of labelling functions (such as heuristics, gazetteers, neural models or linguistic constraints) on text data, apply them on a corpus, and aggregate their results in a fully unsupervised fashion. skweak is especially designed to facilitate the use of weak supervision for NLP tasks such as text classification and sequence labelling. We illustrate the use of skweak for NER and sentiment analysis. skweak is released under an open-source license and is available at https://github.com/NorskRegnesentral/skweak
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2020).
A novel algorithmic approach to Bayesian Logic Regression.
Show summary
Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association mapping. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL (quantitative trait locus) mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2020).
Rejoinder for the discussion of the paper "A Novel Algorithmic Approach to Bayesian Logic Regression".
Bayesian Analysis.
ISSN 1936-0975.
15(1),
p. 312–333.
doi:
10.1214/18-ba1141.
Full text in Research Archive
-
Hubin, Aliaksandr
(2019).
Using node embedding to obtain
information from network based
transactions data in a bank.
-
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2019).
Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
Show summary
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using Bayesian approach: Parameter and prediction uncertainty become easily available, facilitating rigid statistical analysis. Furthermore, prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In this paper we introduce the concept of model uncertainty in BNNs and hence make inference in the joint space of models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Finally, we show that incorporating model uncertainty via Bayesian model averaging and Bayesian model selection allows to drastically sparsify the structure of BNNs without significant loss of predictive power.
-
Hubin, Aliaksandr
(2019).
An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models.
-
Hubin, Aliaksandr; Storvik, Geir Olve; Grini, Paul Eivind & Butenko, Melinka Alonso
(2019).
Bayesian binomial regression model with a latent Gaussian field for analysis of epigenetic data.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2019).
Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
-
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2019).
Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
-
Storvik, Geir Olve & Hubin, Aliaksandr
(2019).
Combining model and parameter uncertainty in Bayesian neural networks.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2018).
Deep Bayesian regression models.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2018).
Deep Bayesian regression models.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2018).
Deep Bayesian regression models.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2018).
Deep Bayesian regression models.
Show summary
Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving inference and prediction. However, often linear models are not sufficient and the complex relationship between input variables and a response is better described by introducing non-linearities and complex functional interactions. Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. The aim of this paper is to bring the ideas of deep learning into a statistical framework which yields more parsimonious models and allows to quantify model uncertainty. To this end we introduce the class of deep Bayesian regression models (DBRM) consisting of a generalized linear model combined with a comprehensive non-linear feature space, where non-linear features are generated just like in deep learning but combined with variable selection in order to include only important features. DBRM can easily be extended to include latent Gaussian variables to model complex correlation structures between observations, which seems to be not easily possible with existing deep learning approaches. Two different algorithms based on MCMC are introduced to fit DBRM and to perform Bayesian inference. The predictive performance of these algorithms is compared with a large number of state of the art algorithms. Furthermore we illustrate how DBRM can be used for model inference in various applications.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2018).
Deep Bayesian regression models.
Show summary
Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving inference and prediction. However, often linear models are not sufficient and the complex relationship between input variables and a response is better described by introducing non-linearities and complex functional interactions. Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. The aim of this paper is to bring the ideas of deep learning into a statistical framework which yields more parsimonious models and allows to quantify model uncertainty. To this end we introduce the class of deep Bayesian regression models (DBRM) consisting of a generalized linear model combined with a comprehensive non-linear feature space, where non-linear features are generated just like in deep learning but combined with variable selection in order to include only important features. DBRM can easily be extended to include latent Gaussian variables to model complex correlation structures between observations, which seems to be not easily possible with existing deep learning approaches. Two different algorithms based on MCMC are introduced to fit DBRM and to perform Bayesian inference. The predictive performance of these algorithms is compared with a large number of state of the art algorithms. Furthermore we illustrate how DBRM can be used for model inference in various applications.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2018).
Deep Bayesian regression models.
Show summary
Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving inference and prediction. However, often linear models are not sufficient and the complex relationship between input variables and a response is better described by introducing non-linearities and complex functional interactions. Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. The aim of this paper is to bring the ideas of deep learning into a statistical framework which yields more parsimonious models and allows to quantify model uncertainty. To this end we introduce the class of deep Bayesian regression models (DBRM) consisting of a generalized linear model combined with a comprehensive non-linear feature space, where non-linear features are generated just like in deep learning but combined with variable selection in order to include only important features. DBRM can easily be extended to include latent Gaussian variables to model complex correlation structures between observations, which seems to be not easily possible with existing deep learning approaches. Two different algorithms based on MCMC are introduced to fit DBRM and to perform Bayesian inference. The predictive performance of these algorithms is compared with a large number of state of the art algorithms. Furthermore we illustrate how DBRM can be used for model inference in various applications.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2017).
Deep non-linear regression models in a Bayesian framework.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Grini, Paul Eivind
(2017).
Variable selection in binomial regression with latent Gaussian field models for analysis of
epigenetic data.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2017).
A novel algorithmic approach to Bayesian Logic Regression.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2017).
A novel GMJMCMC algorithm for Bayesian Logic Regression models.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2017).
Efficient mode jumping MCMC for Bayesian
variable selection and model averaging in GLMM.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian
(2017).
A novel algorithmic approach to Bayesian Logic Regression.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2016).
Efficient mode jumping MCMC for Bayesian variable selection in GLM with random effects models.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2016).
On Mode Jumping in MCMC for Bayesian Variable
Selection within GLMM.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2016).
VARIABLE SELECTION IN BINOMIAL REGRESSION WITH LATENT GAUSSIAN FIELD MODELS FOR ANALYSIS OF EPIGENETIC DATA.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2016).
Variable selection in logistic regression with a latent Gaussian field models with an application in epigenomics.
-
Hubin, Aliaksandr
(2015).
Statistics for Epigenetics.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2015).
On model selection in Hidden Markov Models with covariates.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2015).
Variable selection in binomial regression with a latent
Gaussian field models for analysis of epigenetic data.
-
Hubin, Aliaksandr & Storvik, Geir Olve
(2015).
Variable selection in binomial regression with a latent
Gaussian field models for analysis of epigenetic data.
-
Hubin, Aliaksandr; Norlund, Ellen Karoline & Gribkovskaia, Irina
(2014).
Evaluating robustness of speed optimized supply vessel schedules.
Show summary
Offshore installations need supply vessel services on a regular basis.
Weather uncertainty impacts on how service is performed. We incorporate
different robustness and speed optimization strategies into
the two-phase optimization procedure for generation of supply vessel
schedules. To compare performance of these strategies by evaluating
robustness of generated schedules with different service parameters a
discrete-event simulation model is developed. Based on results from
simulation strategies for improving robustness incorporated into the
simulation model are applied to modify the schedules.
-
Hubin, Aliaksandr & Aas, Kjersti
(2019).
FinAI: Scalable techniques to stock price time series
modelling.
Norsk Regnesentral.
-
Hubin, Aliaksandr
(2018).
Bayesian model configuration, selection and averaging in complex regression contexts.
Universitetet i Oslo.
ISSN 1501-7710.
View all works in Cristin
Published Jan. 5, 2021 12:09 PM
- Last modified Mar. 16, 2022 4:46 PM