The Genomic HyperBrowser (completed)
The Genomic HyperBrowser is a generic web-based system, providing statistical methodology and computing power to handle a variety of biological inquires on genomic datasets. The system can be tried out at a publicly available main web instance.
About the project
The Genomic HyperBrowser (http:// hyperbrowser.uio.no) is an open-ended web server for the analysis of genome-scale data in the form of coordinates relative to reference genome assemblies, i.e. as genomic tracks. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.
The current version of the HyperBrowser is the result of an interdisciplinary collaboration established in late 2007, with an ambition of developing a streamlined and generic system for exploring relations between genomic features. Seven years later this functionality has matured, and the project is going into a next phase where the analysis of 3D genome structure, non-coding variation and genotype-phenotype relations stand out as some of the main ambitions.
The aim of the project is to develop new methodology for the analysis of genome-scale data, and apply the methodology to gain functional understanding of environment interactions and molecular processes of cells, including those underlying disease (including cancer) and particular traits.
This main aim manifests itself in a variety of concrete research projects. In order for the overarching HyperBrowser project to provide additional value to the individual research projects, we focus on two more strategic overarching objectives, which increase the impact of and efficiency with which we can perform genome analyses and develop new methodology:
Provide an internationally recognized public service for analysis of genomic data
As of 2014 we have 4000 unique international users yearly (according to Google Analytics), and a goal is to increase this to 10 000 yearly users. Providing the Genomic HyperBrowser functionality as a service to the international research community in this way is an objective of the project in itself. Furthermore, it provides our own team with a powerful dissemination platform for methodology we develop, strenghtening uptake of new methodology and potentially improving chances of new methodology getting published in good journals. Finally, having the HyperBrowser as public system showcases our capabilities in genome analysis, and this has led to several interesting interactions and collaborations.
Provide a computational infrastructure that accellerate analysis and method development for each team member
The HyperBrowser system consists of around 90 000 lines of Python code, providing robust functionality for a range of aspects related to genome analysis. It includes code for representing genomic data, parsing data files, performing computations and statistical analyses on data, constructing user interfaces, presenting analysis output and much more. Through having this existing code base of core functionality at our disposal, it is possible to focus more effort on the unique characteristics of an individual research project and still have a robust context around the new functionality. In addition to the code base itself, we also have also through the project built up a team with broad competence on aspects related to genome analysis, software development, user interaction, statistical analysis and more. This environment allows each individual team member to focus efforts on particular issues, while relying on other team members to cover the full breadth of aspects necessary in a given research project.
In the same way that the historical introduction of interchangeable parts and division of labor greatly improved productivity in society, we believe these concepts can also guide increased effectiveness of computational research.
Interchangeable parts: Envision generalized version of problem - develop robust generic components - customize to the particular need
Instead of solving a particular problem/need by finding a direct solution to that particular problem, we first consider whether it could be solved by a more generalized approach. If so, we develop robust components according to this, and finally customize these components in order to solve the (original) particular problem. This allows us to develop robust components that we reuse extensively, increasing both productivity and thoroughness.
Division of labor: Allow each member to specialize through relying on team for completeness
We promote division of labor by not requiring every member of the team to learn how to handle every aspect needed for his/her research projects. For some kinds of tasks, acquiring the required competence takes much more effort than doing the concrete work. We then often rely on each other, sometimes by outsourcing the full task (in team development projects), at other times by providing each other with advice or trouble shooting.
Exploiting genomic coordinates as common denominator
In order to make the above general principles work out in practice, we need to combine them with a focused research direction that ensures overlap between projects and between people in the team. We do this by focusing on genome analysis, and relying strongly on genomic coordinates for data representation and integration. This allows us to approach a variety of biological problems while still being able to build a focused competence and to extensively reuse methodology.
- HiBrowse: a statistical web toolkit for analyzing, interpreting and visualizing genome-wide chromosome conformation capture data, such as Hi-C, TCC, GCC
- Analysis of genotype-phenotype relations, connected to environmental influences, human disease or animal traits
- Regulatory effects of non-coding variation
The HyperBrowser was initially developed in cooperation with the SFI Statistics For Innovation.
Future developments will be performed in cooperation with several other projects, including the "endringsmiljø" Centre for Computational Inference in Evolutionary Life Science (CELS).
We are seeking broad collaborations to develop comprehensive solutions in the various directions mentioned at this page, and are also very open to other requests for collaboration.
Some of our past and present collaborators:
Kai Trengereid, Finn Drabløs, Morten Rye, Matus Kalas, Ståle Nygård, Krishanthi Gunathasan, Tonje Lien, Sreeram Ramagopalan, Giulio Disanto, Adam Handel, Vito Ricigliano, Anton Nekrutenko, James Taylor
From 2010 to 2014, three team members have delievered their PhD theses, eleven team members have delievered their Master theses, and we have published 22 papers connected to the project in international journals.
Three PhD theses has been delivered based on work tightly connected to the HyperBrowser project:
Halfdan Rydbeck (PhD, dec 2013): Integrative epigenome analysis
Sveinung Gundersen (PhD, jan 2014): Representation and integrated analysis of heterogeneous genomic datasets
Jonas Paulsen (PhD, dec 2014): Inferential analysis of genomic 3D organization
Eleven master students at UiO have delivered their theses on projects related to the Genomic HyperBrowser:
Brynjar Rognved (Master, August 2014)
Henrik Glasø Skifjeld (Master, August 2014)
Fredrik Haaland (Master, June 2013)
Kristoffer Waløen (Master, June 2013)
Tobias Gulbrandsen Waaler (Master, June 2013)
Anders Ramsvik Bragstad (Master, June 2013)
Torkil Vederhus (Master, June 2013)
Hiep Luong Nguyen (Master, September 2011)
Øyvind Øvergaard (Master, September 2011)
Jonathan Lillesæter (Master, June 2011)
Eivind Gard Lund (Master, June 2011)
Twentytwo papers on developments for, or applications of, the Genomic HyperBrowser have been published in international journals:
Half of these papers (11 out of 22) are published in journals that (as of 2014) are at level 2 of the norwegian registry of scientific journals (NSD Publiseringskanaler). All except one of the papers ("Monte Carlo null models..") can be displayed in PubMed.
Papers presenting the main HyperBrowser functionality:
- Sandve GK, Gundersen S, Rydbeck H, Glad IK, Holden L, Holden M, Liestøl K, Clancy T, Ferkingstad E, Johansen M, Nygaard V, Tøstesen E, Frigessi A, Hovig E. The Genomic HyperBrowser: inferential genomics at the sequence level. Genome Biol. 2010;11(12):R121.
- Sandve GK, Gundersen S, Johansen M, Glad IK, Gunathasan K, Holden L, Holden M, Liestøl K, Nygård S, Nygaard V, Paulsen J, Rydbeck H, Trengereid K, Clancy T, Drabløs F, Ferkingstad E, Kalas M, Lien T, Rye MB, Frigessi A, Hovig E. The Genomic HyperBrowser: an analysis web server for genome-scale data. Nucleic Acids Res. 2013 Jul;41(Web Server issue):W133-41.
Papers on methodology developed in connection with the HyperBrowser:
- Børnich C, Grytten I, Hovig E, Paulsen J, Čech M, Sandve GK. Galaxy Portal:
interacting with the galaxy platform through mobile devices. Bioinformatics. 2016
- Rydbeck H, Sandve GK, Ferkingstad E, Simovski B, Rye M, Hovig E. ClusTrack:
feature extraction and similarity measures for clustering of genome-wide data
sets. PLoS One. 2015 Apr 16;10(4):e0123261.
- Paulsen J, Lien TG, Sandve GK, Holden L, Borgan O, Glad IK, Hovig E. Handling realistic assumptions in hypothesis testing of 3D co-localization of genomic elements. Nucleic Acids Res. 2013 May 1;41(10):5164-74.
- Paulsen J, Sandve GK, Gundersen S, Lien TG, Trengereid K, Hovig E. HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization. Bioinformatics. 2014 Jun 1;30(11):1620-2.
- Gundersen S, Kalaš M, Abul O, Frigessi A, Hovig E, Sandve GK. Identifying elemental genomic track types and representing them uniformly. BMC Bioinformatics. 2011 Dec 30;12:494.
- Sandve GK, Ferkingstad E, Nygård S. Sequential Monte Carlo multiple testing. Bioinformatics. 2011 Dec 1;27(23):3235-41.
- Sandve GK, Gundersen S, Rydbeck H, Glad IK, Holden L, Holden M, Liestøl K, Clancy T, Drabløs F, Ferkingstad E, Johansen M, Nygaard V, Tøstesen E, Frigessi A, Hovig E. The differential disease regulome. BMC Genomics. 2011 Jul 7;12:353.
Papers on applications of genome analysis and the HyperBrowser, co-authored by team members:
- Bengtsen M, Klepper K, Gundersen S, Cuervo I, Drabløs F, Hovig E, Sandve GK,
Gabrielsen OS, Eskeland R. c-Myb Binding Sites in Haematopoietic Chromatin
Landscapes. PLoS One. 2015 Jul 24;10(7):e0133280.
Ricigliano VA, Handel AE, Sandve GK, Annibali V, Ristori G, Mechelli R, Cader
MZ, Salvetti M. EBNA2 binds to genomic intervals associated with multiple
sclerosis and overlaps with vitamin D receptor occupancy. PLoS One. 2015 Apr
Christiansen IK, Sandve GK, Schmitz M, Dürst M, Hovig E. Transcriptionally
active regions are the preferred targets for chromosomal HPV integration in
cervical carcinogenesis. PLoS One. 2015 Mar 20;10(3):e0119566.
Molyneux SD, Waterhouse PD, Shelton D, Shao YW, Watling CM, Tang QL, Harris IS, Dickson BC, Tharmapalan P, Sandve GK, Zhang X, Bailey SD, Berman H, Wunder JS, Iszvak Z, Lupien M, Mak TW, Khokha R. Human somatic cell mutagenesis creates genetically tractable sarcomas. Nat Genet. 2014 [Epub ahead of print]
Rye M, Sandve GK, Daub CO, Kawaji H, Carninci P, Forrest AR, Drabløs F; FANTOM consortium. Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines. BMC Genomics. 2014 Mar 26;15:120.
- Handel AE, Sandve GK, Disanto G, Handunnetthi L, Giovannoni G, Ramagopalan SV. Integrating multiple oestrogen receptor alpha ChIP studies: overlap with disease susceptibility regions, DNase I hypersensitivity peaks and gene expression. BMC Med Genomics. 2013 Oct 30;6:45.
- Handel AE, Sandve GK, Disanto G, Berlanga-Taylor AJ, Gallone G, Hanwell H, Drabløs F, Giovannoni G, Ebers GC, Ramagopalan SV. Vitamin D receptor ChIP-seq in primary CD4+ cells: relationship to serum 25-hydroxyvitamin D levels and autoimmune disease. BMC Med. 2013 Jul 12;11:163.
- Watson CT, Disanto G, Sandve GK, Breden F, Giovannoni G, Ramagopalan SV. Age-associated hyper-methylated regions in the human brain overlap with bivalent chromatin domains. PLoS One. 2012;7(9):e43840.
- Disanto G, Sandve GK, Berlanga-Taylor AJ, Morahan JM, Dobson R, Giovannoni G, Ramagopalan SV. Genomic regions associated with multiple sclerosis are active in B cells. PLoS One. 2012;7(3):e32281.
- Disanto G, Sandve GK, Berlanga-Taylor AJ, Ragnedda G, Morahan JM, Watson CT, Giovannoni G, Ebers GC, Ramagopalan SV. Vitamin D receptor binding, chromatin states and association with multiple sclerosis. Hum Mol Genet. 2012 Aug 15;21(16):3575-86.
- Disanto G, Kjetil Sandve G, Ricigliano VA, Pakpoor J, Berlanga-Taylor AJ, Handel AE, Kuhle J, Holden L, Watson CT, Giovannoni G, Handunnetthi L, Ramagopalan SV. DNase hypersensitive sites and association with multiple sclerosis. Hum Mol Genet. 2014 Feb 15;23(4):942-8.
Other papers directly inspired by the HyperBrowser work:
- Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013 Oct;9(10):e1003285.
- Ferkingstad E, Holden L and Sandve GK. Monte Carlo null models for genomic data. Statistical Science 2014, in press.