I am a professor of computer science affiliated with the Biomedical Informatics Research Group (BMI) at the Department of Informatics (IFI) at University of Oslo (UiO). I am also head of the Biostatistics and Bioinformatics group in the KG Jebsen Centre for Breast Cancer Research, head of Bioinformatics in the MetAction project, senior investigator in the Centre for Cancer Biomedicine (a Norwegian Centre of Excellence), and affiliated with CELS (Centre for Computational Inference in Evolutionary Life Science, UiO).
Research interests
Mathematical and statistical modeling is the natural language to use for making precise statements about associations and structures in biology, and indeed such models form the backbone of countless analysis methods and bioinformatics tools. Equally important, however, modeling may be used highly interactively to explore properties of biological systems and to confront hypotheses with reality (through simulations). I find both these perspectives very fruitful in my own work. While some applications inevitably require sophisticated models and computational schemes, it is surprising and comforting to see that even the simplest models (if rightly chosen) can be very powerful in terms of revealing key properties of extremely complex systems. The recent explosion in massively parallel observations obtained with microarrays, high-throughput sequencing and other technologies, constitute a serious challenge in many applications. Lack of computational power and data storage are serious problems in this context, but equally important is the need for more research to understand the theoretical limitations on what can be learned from studies involving thousands or even millions of parallel observations on a few hundred or a few thousand individuals.
From a biological point of view, I have a particular interest in the various manifestations and implications of molecular evolution. The tremendous success of introducing concepts such as cancer 'driver' and 'passenger' genes, and more lately notions such as cancer life histories and archaeology of cancer illustrates the power of alluding to evolutionary concepts. Still, evolution is a very elusive concept, even if it is well understood from a mathematical perspective. As Stephen Wolfram has pointed out, very simple programs (such as those written in the DNA) often result in very complex behavior (i.e. phenotype). So even though biological processes adhere to evolutionary principles and the latter are reasonably well understood, we may not be able to predict what comes out of an evolutionary process any more than we are able to predict the weather beyond a few weeks. There are definitely certain regularities in evolutionary processes, however; in a study of 2000 breast carcinomas (the METABRIC cohort) Curtis et al (2012) found 10 distinct classes of tumors each being characterized by a unique set of genes driving cancer development. Also in other types of cancer one typically finds a small number of distinct subgroups, indicating that nature (and evolution) is not behaving completely at random but following certains paths (often with many detours, though).
From a methodological point of view, I have a particular interest in large-scale inference on the basis of thousands or even millions of parallel data sets that may come from a single or several different measurement sources (the latter case often being referred to in the genomics literature as 'integrative analysis'). Apart from the more obvious problems arising in such settings (such as the need to adjust for multiple comparisons), the deeper challenges concerns how to take advantage of the parallelity of the data (e.g. through empirical Bayes approaches) and developing means to impose biologically reasonable (and justifiable) constraints on the models in order to ensure uniqueness and stability of estimates. Answering such questions requires deep knowledge of statistics as well as biology and is a truly interdisciplinary task.
Examples of recent work
Here are a few selected projects over the last years where I have played a major part in the methodological development (names in parentheses indicating others who have also played a particularly central role in the methodological development):
- CARMA (Copy Aberration Regional Mapping Analysis) is a computational approach to analyze allele-specific copy number data from tumor DNA obtained with SNP arrays or by high-throughput sequencing. A series of distinct patterns of genomic aberrations are identified, each representing a particular mode of variation in individual copy number profiles. The purpose is to derive a compact, biologically and clinically relevant representation of the genomic architecture in a tumor (together with Gro Nilsen, Hans Kristian Moen Vollan, Arne Pladsen and Hege Russnes and others).
- ASCAT (Allele-Specific Copy Aberrations in Tumors) is an algorithm for estimation of allele-specific copy numbers, genome ploidy and tumor cell percentage on the basis of SNP array data or HTS data (together with Peter Van Loo and Silje Nord)
- CAAI (Complex Arm-wise Aberration Index) is a score of genomic complexity which has been shown to be an independent DNA-based prognostic marker in breast cancer (together with Hege Russnes and Hans Kristian Moen Vollan)
- PART (Partitioning Algorithm using Recursive Thresholding) is an algorithm for identification of clusters in hierarchical clustering trees (together with Gro Nilsen, Knut Liestøl and Ørnulf Borgan).
- PCF (Piecewise Constant Fitting) is an algorithm for segmentation of array CGH data or SNP array data (together with Gro Nilsen and Knut Liestøl)
Over the last five years I have also been heavily involved in several genomics integration projects, including for example:
- Prediction of histological transformation (most often to DLBCL) in follicular lymphoma, based on whole-genome copy number and gene expression data (together with Marianne Brodtkorb, Harald Holte, Erlend Smeland and others)
- Identification of novel drivers of progression in breast cancer, based on whole-genome copy number and gene expression data (together with Miriam Ragle Aure, Lars Baumbusch, Israel Steinfeld, Anne-Lise Børresen-Dale, Zohar Yakhini and others)
- Identification of genes for which the expression is deregulated in breast cancer both epigenetically (through methylation) and by copy number alteration (together with Miriam Ragle Aure and others)
- Deriving a map of all direct and indirect interactions between whole-genome miRNA and a panel of 105 cancer related proteins (together with Miriam Ragle Aure, Sandra Jernstrøm, Marit Krohn and others)
PhD supervision
I am/has been involved in supervision of the following PhD students:
- Maren Høland
- Vandana Sandhu
- Aliaksandr Hubin
- Even Sannes Riiser
- Hans Kristian Moen Vollan (2015)
- Gro Nilsen (2015)
- Miriam Ragle Aure (2013)
- Eldrid Borgan (2012)
- Xi Zhao (2011)
Master project supervision
I am/has been involved in supervision of the following master students:
- Marius Bernklev
- Anders Flisvang
- Stian Lågstad
- Jonas Meier Strømme
- Kine Veronica Lund (2014)
- Gro Nilsen (2009)
- Espen Solberg (2007)
- Stian Grenborgen (2007)
- Olav Skjelkvåle Ligaarden (2007)
- Daniel Løkken Rustad (2005)
- Hege Leite Størvold (2004)
- Kristin Robertsen (2003)