Date and Place of Birth
04.05.1990, Minsk, Republic of Belarus
Academic interests
Statistics, Artificial Intelligence, Econometrics, Machine Learning, Operations Research
Research profiles
Google Scholar
Researchgate
LinkedIn
Courses Taught
STK2130 - Modeling by Stochastic Processes (plenary sessions and exercises)
STK3100 - Introduction to generalized linear models (exercises)
STK4900 - Statistical methods and applications (plenary sessions and exercises)
Academic Background
University of Oslo, Oslo, Norway — PhD
August 2014 - August 2018
Faculty of Mathematics and Natural Sciences
Specialty: Statistics
Dissertation: "Bayesian model configuration, selection and averaging in complex regression contexts".
Molde University College - Specialized University in Logistics, Molde, Norway — Master of Science
August 2012 - June 2014
Faculty of Economics, Informatics and Social Research.
Specialty: Operations Research
Research Thesis: "Evaluation of Supply Vessel schedules robustness with a posteriori improvements".
Belarusian State University, Minsk Belarus — Specialist
September 2008 - June 2013
Faculty of Applied Mathematics and Computer Science. Department of Mathematical Modelling and Data Analys
Specialty: Economic Cybernetics (mathematical methods and computer based modeling in economy).
Research Thesis: "Methods and tools of investment management in conditions of international diversifications"
Awards
Project
Bayesian model selection: Bayesian model selection.
Positions held
Norwegian Computing Center, Oslo, Norway — Research scientist/Senior research scientist
September 2018 - December 2020
Fundamental research in statistics and machine learning, publishing articles and working on projects involving development of customized statistical and machine learning methodology in various applications for private and public sectors, acting as a reviewer in several highly ranked journals including the Scandinavian Journal of Statistics, Journal of the American Statistical Association and Scientific Reports and conferences including ACL and EMNLP.
University of Oslo, Oslo, Norway —PhD candidate
August 2014 - August 2018
Bayesian variable selection and model averaging. Bayesian deep feature engineering. Applied research with Genetic and Epigenetic data (GWAS, EWAS, QTL mapping, etc.).
Compatibl, Minsk, Belarus —Business analyst
September 2011 - June 2012
Calculation of CVA and regulatory capital as well as full support, implementation and customisation services within Analyst project. Compatibl's customers included some of the largest and most respected banks and hedge funds worldwide, including 4 dealers, 3 supranationals, over 20 central banks, and 3 major financial technology vendors.
Tags:
Bayesian Statistics,
Model selection,
Probabilistic Machine Learning,
Operations Research
Publications
-
Hubin, Aliaksandr; Storvik, Geir Olve; Grini, Paul Eivind & Butenko, Melinka Alonso (2020). A Bayesian binomial regression model with latent gaussian processes for modelling DNA methylation. Austrian Journal of Statistics.
ISSN 1026-597X.
49(4), s 46- 56 . doi:
10.17713/ajs.v49i4.1124
Full text in Research Archive.
Show summary
Epigenetic observations are represented by the total number of reads from a given pool of cells and the number of methylated reads, making it reasonable to model this data by a binomial distribution. There are numerous factors that can influence the probability of success in a particular region. Moreover, there is a strong spatial (alongside the genome) dependence of these probabilities. We incorporate dependence on the covariates and the spatial dependence of the methylation probability for observations from a pool of cells by means of a binomial regression model with a latent Gaussian field and a logit link function. We apply a Bayesian approach including prior specifications on model configurations. We run a mode jumping Markov chain Monte Carlo algorithm (MJMCMC) across different choices of covariates in order to obtain the joint posterior distribution of parameters and models. This also allows finding the best set of covariates to model methylation probability within the genomic region of interest and individual marginal inclusion probabilities of the covariates.
-
Lison, Pierre; Barnes, Jeremy; Hubin, Aliaksandr & Touileb, Samia (2020). Named Entity Recognition without Labelled Data: A Weak Supervision Approach, In Dan Jurafsky; Joyce Chai; Natalie Schluter & Joel Tetreault (ed.),
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
Association for Computational Linguistics.
ISBN 978-1-952148-25-5.
139.
s 1518
- 1533
Show summary
Named Entity Recognition (NER) performance often degrades rapidly when applied to target domains that differ from the texts observed during training. When in-domain labelled data is available, transfer learning techniques can be used to adapt existing NER models to the target domain. But what should one do when there is no hand-labelled data for the target domain? This paper presents a simple but powerful approach to learn NER models in the absence of labelled data through weak supervision. The approach relies on a broad spectrum of labelling functions to automatically annotate texts from the target domain. These annotations are then merged together using a hidden Markov model which captures the varying accuracies and confusions of the labelling functions. A sequence labelling model can finally be trained on the basis of this unified annotation. We evaluate the approach on two English datasets (CoNLL 2003 and news articles from Reuters and Bloomberg) and demonstrate an improvement of about 7 percentage points in entity-level F1 scores compared to an out-of-domain neural NER model.
-
Hubin, Aliaksandr (2019). An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models, In
ACM International Conference Proceeding Series (ICPS): AIIPCC '19: Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing.
Association for Computing Machinery (ACM).
ISBN 978-1-4503-7633-4.
Article No.: 63.
s 1
- 9
Show summary
Non-homogeneous hidden Markov models (NHHMM) are a subclass of dependent mixture models used for semi-supervised learning, where both transition probabilities between the latent states and the mean parameter of the probability distribution of the responses (for a given state) depend on the set of up to p covariates. A priori we do not know which (and how) covariates influence the transition probabilities and the mean parameters. This induces a complex combinatorial optimization problem for model selection with 4p potential configurations. To address the problem, in this article we propose an adaptive (A) simulated annealing (SA) expectation maximization (EM) algorithm (ASA-EM) for joint optimization of models and their parameters with respect to a criterion of interest.
-
Hubin, Aliaksandr; Storvik, Geir Olve; Grini, Paul Eivind & Butenko, Melinka Alonso (2019). Bayesian binomial regression model with a latent Gaussian field for analysis of epigenetic data, In Y Kharin & Peter Filzmoser (ed.),
Proceedings of Computer Data Analysis and Modeling: Stochastics and Data Science 2019.
Belarusian State University Press.
ISBN 978-985-566-811-5.
1.
s 167
- 171
-
Hubin, Aliaksandr & Storvik, Geir Olve (2018). Mode jumping MCMC for Bayesian variable selection in GLMM. Computational Statistics & Data Analysis.
ISSN 0167-9473.
127, s 281- 297 . doi:
10.1016/j.csda.2018.05.020
Full text in Research Archive.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2018). A Novel Algorithmic Approach to Bayesian Logic Regression. Bayesian Analysis.
ISSN 1936-0975.
15(1), s 263- 311 . doi:
10.1214/18-BA1141
Full text in Research Archive.
Show summary
Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association mapping. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL (quantitative trait locus) mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2016). On Mode Jumping in MCMC for Bayesian Variable Selection within GLMM, In S Aivazian; Peter Filzmoser & Y Kharin (ed.),
COMPUTER DATA ANALYSIS AND MODELING. Theoretical and Applied Stochastics. Proceedings of the XI International Conference..
Belarusian State University.
ISBN 978-985-553-366-6.
KAPITTEL.
s 275
- 278
Show summary
Generalized linear mixed models (GLMM) are addressed for inference and prediction in a wide range of different applications providing a powerful scientific tool for the researchers and analysts coming from different fields. At the same time more sources of data are becoming available introducing a variety of hypothetical explanatory variables for these models to be considered. Estimation of posterior model probabilities and selection of an optimal model is thus becoming crucial. We suggest a novel mode jumping MCMC procedure for Bayesian model averaging and model selection in GLMM.
View all works in Cristin
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2020). A novel algorithmic approach to Bayesian Logic Regression.
Show summary
Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association mapping. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL (quantitative trait locus) mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2020). Rejoinder for the discussion of the paper "A Novel Algorithmic Approach to Bayesian Logic Regression". Bayesian Analysis.
ISSN 1936-0975.
15(1), s 312- 333 . doi:
10.1214/18-ba1141
Show summary
Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association mapping. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL (quantitative trait locus) mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github.
-
Hubin, Aliaksandr (2019). An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models.
-
Hubin, Aliaksandr (2019). Using node embedding to obtain information from network based transactions data in a bank.
-
Hubin, Aliaksandr & Aas, Kjersti (2019). FinAI: Scalable techniques to stock price time series modelling. NR-notat. SAMBA/54/19.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2019). Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2019). Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
Show summary
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using Bayesian approach: Parameter and prediction uncertainty become easily available, facilitating rigid statistical analysis. Furthermore, prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In this paper we introduce the concept of model uncertainty in BNNs and hence make inference in the joint space of models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Finally, we show that incorporating model uncertainty via Bayesian model averaging and Bayesian model selection allows to drastically sparsify the structure of BNNs without significant loss of predictive power.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2019). Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2019). Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
Show summary
Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using Bayesian approach: Parameter and prediction uncertainty become easily available, facilitating rigid statistical analysis. Furthermore, prior knowledge can be incorporated. However so far there have been no scalable techniques capable of combining both model (structural) and parameter uncertainty. In this paper we introduce the concept of model uncertainty in BNNs and hence make inference in the joint space of models and parameters. Moreover, we suggest an adaptation of a scalable variational inference approach with reparametrization of marginal inclusion probabilities to incorporate the model space constraints. Finally, we show that incorporating model uncertainty via Bayesian model averaging and Bayesian model selection allows to drastically sparsify the structure of BNNs without significant loss of predictive power.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2019). Combining Model and Parameter Uncertainty in Bayesian Neural Networks.
-
Hubin, Aliaksandr; Storvik, Geir Olve; Grini, Paul Eivind & Butenko, Melinka Alonso (2019). Bayesian binomial regression model with a latent Gaussian field for analysis of epigenetic data.
-
Storvik, Geir Olve & Hubin, Aliaksandr (2019). Combining model and parameter uncertainty in Bayesian neural networks.
-
Hubin, Aliaksandr (2018). Bayesian model configuration, selection and averaging in complex regression contexts. Series of dissertations submitted to the Faculty of Mathematics and Natural Sciences, University of Oslo.. 2035.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2018). Deep Bayesian regression models.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2018). Deep Bayesian regression models.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2018). Deep Bayesian regression models.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2018). Deep Bayesian regression models.
Show summary
Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving inference and prediction. However, often linear models are not sufficient and the complex relationship between input variables and a response is better described by introducing non-linearities and complex functional interactions. Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. The aim of this paper is to bring the ideas of deep learning into a statistical framework which yields more parsimonious models and allows to quantify model uncertainty. To this end we introduce the class of deep Bayesian regression models (DBRM) consisting of a generalized linear model combined with a comprehensive non-linear feature space, where non-linear features are generated just like in deep learning but combined with variable selection in order to include only important features. DBRM can easily be extended to include latent Gaussian variables to model complex correlation structures between observations, which seems to be not easily possible with existing deep learning approaches. Two different algorithms based on MCMC are introduced to fit DBRM and to perform Bayesian inference. The predictive performance of these algorithms is compared with a large number of state of the art algorithms. Furthermore we illustrate how DBRM can be used for model inference in various applications.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2018). Deep Bayesian regression models.
Show summary
Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving inference and prediction. However, often linear models are not sufficient and the complex relationship between input variables and a response is better described by introducing non-linearities and complex functional interactions. Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. The aim of this paper is to bring the ideas of deep learning into a statistical framework which yields more parsimonious models and allows to quantify model uncertainty. To this end we introduce the class of deep Bayesian regression models (DBRM) consisting of a generalized linear model combined with a comprehensive non-linear feature space, where non-linear features are generated just like in deep learning but combined with variable selection in order to include only important features. DBRM can easily be extended to include latent Gaussian variables to model complex correlation structures between observations, which seems to be not easily possible with existing deep learning approaches. Two different algorithms based on MCMC are introduced to fit DBRM and to perform Bayesian inference. The predictive performance of these algorithms is compared with a large number of state of the art algorithms. Furthermore we illustrate how DBRM can be used for model inference in various applications.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2018). Deep Bayesian regression models.
Show summary
Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving inference and prediction. However, often linear models are not sufficient and the complex relationship between input variables and a response is better described by introducing non-linearities and complex functional interactions. Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. The aim of this paper is to bring the ideas of deep learning into a statistical framework which yields more parsimonious models and allows to quantify model uncertainty. To this end we introduce the class of deep Bayesian regression models (DBRM) consisting of a generalized linear model combined with a comprehensive non-linear feature space, where non-linear features are generated just like in deep learning but combined with variable selection in order to include only important features. DBRM can easily be extended to include latent Gaussian variables to model complex correlation structures between observations, which seems to be not easily possible with existing deep learning approaches. Two different algorithms based on MCMC are introduced to fit DBRM and to perform Bayesian inference. The predictive performance of these algorithms is compared with a large number of state of the art algorithms. Furthermore we illustrate how DBRM can be used for model inference in various applications.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2017). Efficient mode jumping MCMC for Bayesian variable selection and model averaging in GLMM.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2017). A novel GMJMCMC algorithm for Bayesian Logic Regression models.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2017). A novel algorithmic approach to Bayesian Logic Regression.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2017). A novel algorithmic approach to Bayesian Logic Regression.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Frommlet, Florian (2017). Deep non-linear regression models in a Bayesian framework.
-
Hubin, Aliaksandr; Storvik, Geir Olve & Grini, Paul Eivind (2017). Variable selection in binomial regression with latent Gaussian field models for analysis of epigenetic data.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2016). Efficient mode jumping MCMC for Bayesian variable selection in GLM with random effects models.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2016). On Mode Jumping in MCMC for Bayesian Variable Selection within GLMM.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2016). VARIABLE SELECTION IN BINOMIAL REGRESSION WITH LATENT GAUSSIAN FIELD MODELS FOR ANALYSIS OF EPIGENETIC DATA.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2016). Variable selection in logistic regression with a latent Gaussian field models with an application in epigenomics.
-
Hubin, Aliaksandr (2015). Statistics for Epigenetics.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2015). On model selection in Hidden Markov Models with covariates.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2015). Variable selection in binomial regression with a latent Gaussian field models for analysis of epigenetic data.
-
Hubin, Aliaksandr & Storvik, Geir Olve (2015). Variable selection in binomial regression with a latent Gaussian field models for analysis of epigenetic data.
-
Hubin, Aliaksandr; Norlund, Ellen Karoline & Gribkovskaia, Irina (2014). Evaluating robustness of speed optimized supply vessel schedules.
Show summary
Offshore installations need supply vessel services on a regular basis. Weather uncertainty impacts on how service is performed. We incorporate different robustness and speed optimization strategies into the two-phase optimization procedure for generation of supply vessel schedules. To compare performance of these strategies by evaluating robustness of generated schedules with different service parameters a discrete-event simulation model is developed. Based on results from simulation strategies for improving robustness incorporated into the simulation model are applied to modify the schedules.
View all works in Cristin
Published Jan. 5, 2021 12:09 PM
- Last modified Jan. 21, 2021 10:41 AM