I am a professor of computer science affiliated with the Biomedical Informatics Research Group (BMI) at the Department of Informatics (IFI) at University of Oslo (UiO). I am head of the Biostatistics and Bioinformatics group in the KG Jebsen Centre for Breast Cancer Research, senior investigator in the Centre for Cancer Biomedicine (a Norwegian Centre of Excellence), and affiliated with CELS (Centre for Computational Inference in Evolutionary Life Science, UiO).
Research interests
My research focuses on understanding aspects of biological systems through the use of mathematics and the development of computational tools. I have worked in particular on two types of biological systems: ecological systems (of interacting populations of individuals under external influences) and human cancers. Both types of systems involve processes on different time scales: changes in size and spatial distribution over shorter time spans and changes in genotype and phenotype over longer time spans. These processes are not independent of each other, and much of today's research in both ecology and cancer biology concerns the interlinkage between the two process levels. For example, how do the genotype and phenotype of individuals in a population affect the length and amplitude of population cycles? And can we determine from the genotype of tumor cells how likely they are to become metastatic (i.e. spread to distant organs)? Mathematical and statistical modeling is the language of choice to make precise statements about associations and structures in biology. Some applications inevitably require sophisticated models and computational schemes. However, often a complex system is best studied by focusing on understanding just one or a few properties at a time and applying simple and easily interpretable (and falsifiable) models. In fact, the more complex a model is and the more degrees of freedom it has, the more biological scenarios it will encompass and thus the less we can learn from it.
The recent explosion in massively parallel observations obtained with microarrays, high-throughput sequencing and other technologies, constitute a serious challenge in many applications. Lack of computational power and data storage are serious problems in this context, but equally important is the need for more research to understand the theoretical limitations on what can be learned from studies involving thousands or even millions of parallel observations on a few hundred or a few thousand individuals.
From a biological point of view, I have a particular interest in the various manifestations and implications of molecular evolution. The tremendous success of introducing concepts such as cancer 'driver' and 'passenger' genes, and more lately notions such as cancer life histories and archaeology of cancer illustrates the power of alluding to evolutionary concepts. Still, evolution is a very elusive concept, even if it is well understood from a mathematical perspective. Part of the problem is that the map from genotype to phenotype can be very complicated. So even though biological processes adhere to evolutionary principles and the latter are reasonably well understood, we may not be able to predict what comes out of an evolutionary process any more than we are able to predict the weather beyond a few weeks (though for a different reason). There are definitely certain regularities in evolutionary processes, however; in a study of 2000 breast carcinomas (the METABRIC cohort) Curtis et al (2012) found 10 distinct classes of tumors each being characterized by a unique set of genes driving cancer development. Also in other types of cancer one typically finds a small number of distinct subgroups, indicating that nature (and evolution) is not behaving completely at random but following certains paths (often with many detours).
From a methodological point of view, I have a particular interest in large-scale inference on the basis of thousands or even millions of parallel data sets that may come from a single or several different measurement sources (the latter case often being referred to in the genomics literature as 'integrative analysis'). Apart from the more obvious problems arising in such settings (such as the need to adjust for multiple comparisons), the deeper challenges concerns how to take advantage of the parallelity of the data (e.g. through empirical Bayes approaches) and developing means to impose biologically reasonable (and justifiable) constraints on the models in order to ensure uniqueness and stability of estimates. Answering such questions requires deep knowledge of statistics as well as biology and is a truly interdisciplinary task.
Examples of recent work
Here are a few selected projects over the last years where I have been the PI or co-PI on the method development (names in parentheses indicating others who have also played a particularly central role in the methodological development):
- CARMA (Copy Aberration Regional Mapping Analysis) is a computational approach to analyze allele-specific copy number data from tumor DNA obtained with SNP arrays or by high-throughput sequencing. A series of distinct patterns of genomic aberrations are identified, each representing a particular mode of variation in individual copy number profiles. The purpose is to derive a compact, biologically and clinically relevant representation of the genomic architecture in a tumor (together with Arne Pladsen, Gro Nilsen, Hege Russnes and others).
- ASCAT (Allele-Specific Copy Aberrations in Tumors) is an algorithm for estimation of allele-specific copy numbers, genome ploidy and tumor cell percentage on the basis of SNP array data or HTS data (together with Peter Van Loo and Silje Nord)
- CAAI (Complex Arm-wise Aberration Index) is a score of genomic complexity which has been shown to be an independent DNA-based prognostic marker in breast cancer (together with Hege Russnes and Hans Kristian Moen Vollan)
- PART (Partitioning Algorithm using Recursive Thresholding) is an algorithm for identification of clusters in hierarchical clustering trees (together with Gro Nilsen, Knut Liestøl and Ørnulf Borgan).
- PCF (Piecewise Constant Fitting) is an algorithm for segmentation of array CGH data or SNP array data (together with Gro Nilsen and Knut Liestøl)
Over the last five years I have also been heavily involved in several genomics integration projects, including for example:
- Prediction of histological transformation (most often to DLBCL) in follicular lymphoma, based on whole-genome copy number and gene expression data (together with Marianne Brodtkorb, Harald Holte, Erlend Smeland and others)
- Identification of novel drivers of progression in breast cancer, based on whole-genome copy number and gene expression data (together with Miriam Ragle Aure, Lars Baumbusch, Israel Steinfeld, Anne-Lise Børresen-Dale, Zohar Yakhini and others)
- Identification of genes for which the expression is deregulated in breast cancer both epigenetically (through methylation) and by copy number alteration (together with Miriam Ragle Aure and others)
- Deriving a map of all direct and indirect interactions between whole-genome miRNA and a panel of 105 cancer related proteins (together with Miriam Ragle Aure, Sandra Jernstrøm, Marit Krohn and others)
PhD supervision
I am/have been involved in supervision of the following doctoral students:
- Christian Fougner
- Ståle Hårberg
- Anand Khadse
- Julian Hamfjord
- Stina Stål
- Chloé Steen
- Maren Høland
- Vandana Sandhu
- Aliaksandr Hubin
- Even Sannes Riiser
- Arne Pladsen
- Hans Kristian Moen Vollan (2015)
- Gro Nilsen (2015)
- Miriam Ragle Aure (2013)
- Eldrid Borgan (2012)
- Xi Zhao (2011)
Master project supervision
I am/have been involved in supervision of the following master students:
- Marius Bernklev
- Anders Flisvang
- Stian Lågstad
- Jonas Meier Strømme
- Kine Veronica Lund (2014)
- Gro Nilsen (2009)
- Espen Solberg (2007)
- Stian Grenborgen (2007)
- Olav Skjelkvåle Ligaarden (2007)
- Daniel Løkken Rustad (2005)
- Hege Leite Størvold (2004)
- Kristin Robertsen (2003)