print logo

Geir Kjetil Sandve

Image of Geir Kjetil Sandve
Norwegian
Phone +47-22840862
Mobile phone +47-93853050
Room 10461
Username
Visiting address Department of Informatics Ole-Johan Dahls hus Gaustadalléen 23B 0373 Oslo

I am a postdoc at the Biomedical Informatics Research Group (BMI) at the Department of informatics (IFI), University of Oslo (UiO). My current research is focused on genome analysis, and previously on motif discovery in DNA, as described in further detail below. I also have a strong interest for teaching and supervision.
 

Master tasks

I am currently offering 10 masters tasks, where also Eivind Hovig, Torbjørn Rognes, Knut Liestøl, Ole Christian Lingjærde and Anja Kristoffersen are co-supervisors:


Visualization to support advanced analysis of genomic data

Unfolding your DNA

Of mice and men 2

Algorithms for large-scale analysis of genomic data

Contributing to large open-source software libraries

Using machine learning to make sense of our genome

Using probabilistic graphical model for genomic data mining

Creating a highly dynamic benchmarking system

Learning game theory from the immune system

The bioinformatics of personal genomes

 

Current and previous master students

I currently have three master students that are in their final phase:

Eirik Mistereggen
Øyvind Øvergaard
Hiep Luong Nguyen

I am also co-supervisor for Phd student Sveinung Gundersen.

I have previously been main supervisor for 9 master students, 7 at NTNU and 2 at UiO:

Eivind Gard Lund: "An Extensible Framework for Comparative Analysis of Annotations"
Jonathan Lunde Lillesæter: "Retroactively Parallelizing a Large Python System"
Kai Trengereid: "Development of a motif discovery tool"
Tarjei S Hveem: "Improving expressibility of simple motifs"
Vetle Valebjørg: "Discovery of approximate composite motifs in biological sequences"
Øystein Lekang: "Flexible Discovery of Modules with Distance Constraints"
Øyvind Bø Syrstad and Lars Eidsheim: "Maskinvare-aksellerert MEME"
Lars Krutådal: "Weighted Pattern Matching with PWMs on FPGAs"
Vegard Walseng: "Learning pattern models from examples"

Lecturing

When having the opportunity to prioritize it, I am very passionate about lecturing. I have been responsible for a course in computer science for teachers , and a previous course in bioinformatics. I have also given lectures in two courses in algorithms (TDT4120 and a previous course), as well as several courses in bioinformatics (TDT4287, INF4350, MBV3070). I have previously coordinated student assistants, developed exercise material, developed a web system for exercise handling, presented material for recruiting high school students to NTNU, developed and presented material for increasing awareness of mathematics and computer science at high schools, taught at elementary schools and been program committe member for a conference on IT and education (NKUL).

Research

The Genomic HyperBrowser

My main research interest has the last years been the development of statistical and algorithmic methodology for large-scale analysis of genomic data. As part of this, I have been a main developer of The Genomic HyperBrowser, which is an open source, web-based software system for statistical analysis. Our ambition is nothing less than being the internationally leading system for genome analysis, in a (triangle) synergy with the UCSC/Ensembl genome browsers for storing/retrieving genomic data, and Galaxy for manipulating genomic data.

The Genomic HyperBrowser is the result of an exceptionally tight collaboration between computer scientists, statististicians and biologists. I have from the start been working very closely together with fellow informaticians Sveinung Gundersen and Morten Johansen (and later also Vegard Nygaard and Kai Trengereid), with statistiticans Arnoldo Frigessi, Ingrid Glad, Lars Holden, Marit Holden, Knut Liestøl and Egil Ferkingstad, and with biologists Eivind Hovig, Halfdan Rydbeck, Eivind Tøstesen and Trevor Clancy.

After two years of intense methodology and software development, we published a paper on the HyperBrowser late 2010. With the main infrastructure for genome analysis robustly in place, we are now in a phase where we can effectively build on this base in new directions. We recently published a paper on the disease regulome, a global map of over- and under-representation of 450 transcription factors in 1000 diseases. Also, we recently published a paper on Monte Carlo estimation of p-values in multiple testing settings, and a paper where we distinguish elemental genomic track types and propose a new format for genomic data.

Motif discovery in DNA

During my PhD I cooperated closely with Finn Drabløs, as well as Osman Abul, Kjetil Klepper, Jostein Johansen, Vegard Walseng, Øystein Bø Syrstad, Lars Eidsheim and Magnus Nedland on different projects. I also discussed a lot of interesting issues with Arne Halaas, Rolv Seehuus and Magnus Lie, though we never wrote any articles together.

I wrote a well-cited survey of motif discovery in DNA together with Finn Drabløs, where we described a formal mathematical model of the motif discovery process, and placed the current literature (around 100 methods) according to this model. Although this allowed us to precisely place the existing methods, we realized it was still very difficult to say anything about which methods performed best. We thus developed two new benchmarks. First, I developed a set of benchmarks for the discovery of single motifs where we distinguished between modelling motifs as sensitively as possible and finding the best instances according to standard motif models. Second, I contributed to a benchmark for the discovery of cis-regulatory modules, which we constructed based on co-occurring binding sites as found in the TRANSFAC database.

I developed a discretized method Compo and contributed to a probabilistic method Baycis for the discovery of cis-regulatory modules in DNA (in addition to an early article on a method GCMD for composite motif discovery in proteins). In addition to this, I contributed to articles on iterated motif discovery in setting with available gene expression data, on controlling the false discovery rate in motif discovery settings, and on a two-step single motif discovery method. The Compo method was later applied for motif discovery in an allergy-setting.

Side projects

I have done some statistical analysis of DNA melting (denaturation). Most of this is not published, but I contributed to an article on segmenting DNA based on melting properties. While working on motif discovery, I did some work related to FPGA and specialized hardware (though I didn't myself fight with the gritty details), which resultet in a publication on hardware-accelerated motif discovery. Recently, I have contributed to a manuscript in preparation, which analyze a genome-wide integration profile of the Human Papilloma Virus.
 

Future work

We have overwhelmingly many ideas on how to take our work on the HyperBrowser further:

  • One main direction is to allow asking questions not only in the perspective of the genome as a line, but also with DNA as a three-dimensional structure (for instance questions related to spatial proximity).
  • A second main direction is to complement the currently well-developed statistical inference, with similarly sophisticated envisioning of genomic information (study how advanced processing can be used to present genomic data in a way that allows the human eye to make most sense of the data).
  • A third main direction is to build specialized expert/decision systems for particul applications on top of the generic functionality. Examples of such systems are the GREAT and Endavour systems from other labs, and to a certain degree also our disease regulome project.

In addition to these two longer-term ambitions, we have several concrete ideas on future articles as well as literally hundreds of concrete ideas on new features for the HyperBrowser software system. We have concrete ideas for articles on the following topics:

  • Analysis of genomic tracks across organisms and cell types
  • A generic, web-based system for clustering of genomic tracks
  • Highly dynamic benchmarking of genome-oriented predictions
  • Local analysis of genomic tracks using binning or intersection maps
  • Controlling for confounding variables in genome analysis

As we see more interesting research opportunities than we can possibly follow up just by ourselves, we are very happy to include collaborators on the proposed projects mentioned here, in addition to being very open for other collaborative efforts.

Tags: genome analysis, master tasks, bioinformatics

Publications

View all works in Cristin

  • Sandve, Geir Kjetil (2008). Potentials and limitations of motif-based binding site prediction in DNA.
  • Sandve, Geir Kjetil; Abul, Osman; Walseng, Vegard & Drabløs, Finn (2007). Improved benchmarks for computational motif discovery.
  • Abul, Osman; Sandve, Geir Kjetil & Drabløs, Finn (2006). A Methodology for Motif Discovery Employing Iterated Cluster Re-assignment.
  • Abul, Osman; Sandve, Geir Kjetil & Drabløs, Finn (2006). False discovery rates in identifying functional DNA motifs.
  • Abul, Osman; Sandve, Geir Kjetil & Drabløs, Finn (2006). TScan: A two-step de novo motif discovery method.
  • Sandve, Geir Kjetil (2006). Accelerating Motif Discovery: Motif Matching on Parallel Hardware.
  • Sandve, Geir Kjetil; Nedland, Magnar; Bø Syrstad, Øyvind; Eidsheim, Lars Andreas; Abul, Osman & Drabløs, Finn (2006). Accelerating motif discovery: motif matching on parallel hardware.
  • Sandve, Geir Kjetil; Stenersen, Kristoffer; Walseng, Vegard; Lekang, Øystein; Klepper, Kjetil; Abul, Osman; Hveem, Tarjei S et al. (2006). An integrated approach to motif discovery in DNA sequences.
  • Sandve, Geir Kjetil & Drabløs, Finn (2005). Generalized Composite Motif Discovery.

View all works in Cristin

Published Nov 4, 2010 02:16 PM - Last modified Jan 11, 2012 01:15 PM