Title: Prediction models with informative selection priors for the integrative analysis of omics data
Abstract: In the context of oncological clinical trials, researchers routinely collect genome-wide data for multiple molecular data types. The combined analysis of these data can improve the performance of prediction models and the identification of relevant features compared to analyses based on a single data type. This can lead to new insights into the disease biology. We propose a Bayesian variable selection model for the integration of (epi-)genomic data, e.g copy number variation (CNV), into a gene expression-based logistic regression model for two-class prediction and biomarker selection. Specifically, we use CNV information to weigh prior inclusion probabilities of gene expression variables in a stochastic search variable selection algorithm, giving larger weights to genes located in distinctive CNV regions. More precisely, the mean prior inclusion probability of a gene is assumed to follow a mixture of a point mass and a properly elicited distribution capturing the aggregated copy number information and its estimation uncertainty. The random mixture weight automatically adapts to the strength of information the copy number data contains. As a consequence, if CNV data is uninformative or in conflict with the gene expression data, the model collapses to the standard model with beta distributed inclusion probabilities. We will study the model in simulation studies and in an application to breast cancer data. This is joint work with Manuel Wiesenfarth (German Cancer Research Center, Heidelberg) and Ana Corberán-Vallet (University of Valencia).