ELIXIR.NO is the Norwegian node of the European ELIXIR project. The goal of the ELIXIR project is to build "a sustainable European infrastructure for biological information, supporting life science research and its translation to medicine, agriculture, bioindustries and society". The University of Oslo is one of five national nodes of ELIXIR.NO.
Current focus (2017)
- Core developments for the final release GSuite HyperBrowser update of the Genomic HyperBrowser.
- Better deployment of the Genomic HyperBrowser, as a Galaxy fork on GitHub.
- Coordination and development of GTrack and the GTrackCore library.
- Coordination and development of the Galaxy Prototyping Tool API (Galaxy ProTo), which is an extension to the Galaxy analysis framework.
- Further developments of the National e-Infrastructure for Life Science (NeLS), including coordination of UiO node developments
The Genomic HyperBrowser
My PhD thesis focused on my contributions towards the Genomic HyperBrowser project, where I am one of the main developers. The Genomic HyperBrowser is an open source, web-based software system for statistical analysis. Our ambition is to be a leading system for (statistical) genome analysis, in a synergy with the UCSC/Ensembl genome browsers for storing/retrieving genomic data, with Galaxy for manipulating genomic data, and with EpiExplorer for more explorative analysis of genomic data. I have been part of the project since 2007 and developed the core code of the system together with (now) Assoc. Prof. Geir Kjetil Sandve and Morten Johansen. The project has since its inception been a cross-disciplinary project between informaticians, statisticians and biologists, and a sizeable group of researchers and developers have contributed to the system over the years. See the UiO project page of the Genomic HyperBrowser for more information.
The HyperBrowser project has in the recent year undergone dramatic improvements focused on epigenome-wide analyses, making powerful use of user-specified suites, or collections, of related datasets. This new expansion, called GSuite HyperBrowser, empowers researcher to ask questions about their genomic datasets in relation to the vast amount of datasets for different cell types/tissues or different epigenomic marks that has been made available from international projects like ENCODE or Roadmap Epigenomics. GSuite HyperBrowser contains user-friendly guides to answer common domain-specific questions and includes tools for defining suites of datasets, bulk downloading and analysis. GSuite HyperBrowser has been released in a beta version at the main HyperBrowser website. We gladly welcome proposals for research collaborations.
The Genomic HyperBrowser has been selected as one of four main national deliverables from ELIXIR.NO towards the European ELIXIR project. We are currently working on making this deliverable a reality. As part of this, we are currently moving the source code to GitHub, setting up the Genomic HyperBrowser as a fork of the main Galaxy source code. The aim is to make it easier to install the HyperBrowser, as well as to make it easier to include updates from the core Galaxy framework. In addition, we aim to better follow standard Open Source practices by having a public source code for transparent development, issue tracking, and support for developers and users.
- Developer of core functionality
- Integration towards ELIXIR.NO and NeLS
- User support
- Setting up and maintaining the development tools
- Publication of source code
- Deployment and installation
The GTrack, BTrack and GSuite ecosystem
As part of my PhD thesis, I contributed heavily to the development of the GTrack format. The GTrack format was originally designed as a textual format being able to represent all types of data that are possible to analyze in the Genomic HyperBrowser, but has since transgressed this use. The goal is now to launch GTrack as a general format in an ecosystem together with the related formats BTrack and GSuite:
- GTrack is a general tabular file format for representing single genomic track datasets, supporting heterogeneous informational content. GTrack was developed together with version 1.1 of the XML-based BioXSD format in a joint publication in 2011, both supporting the same types of genomic tracks, but for different ecosystems and usage scenarios.
- BTrack is planned to be a binary format able to store multiple genomic tracks in one file, indexed and structured for direct and efficient analysis without the need of parsing. BTrack will be based heavily upon the work of two master students: Brynjar Rongved and Henrik Glasø Skifjeld.
- GSuite is a tabular format for handling a collection of related tracks, usable for efficient retrieval of track data and metadata from public repositories, for intermediate processing of such data, and for transferring such collections as inputs to analysis software.
All formats will be usable both from the command line and as a Python library (GTrackCore), and thus in a range of analysis frameworks. I am currently supervising a master student, Sivert Kronen Hatteberg, who is working on implementing track operations as part of the library.
GTrack has, together with BioXSD, been selected as one of four main national deliverables from ELIXIR.NO towards the European ELIXIR project. We are currently working on making this deliverable a reality.
Galaxy ProTo is a new tool building methodology introduced by the Genomic HyperBrowser project. Galaxy ProTo is an unofficial alternative for defining Galaxy tools. Instead of XML files, Galaxy ProTo supports defining the user interface of a tool as a Python class. There are no limitations to what kind of code that can be executed to generate the interface. For instance one could read the beginning of an input file and provide dynamic options based on the file contents. Galaxy ProTo aims at empowering developers without Galaxy experience to easily develop Galaxy tools, both for prototyping purposes, but also for developing fully functional, interactive tools.
Norwegian e-Infrastructure for Life Sciences (NeLS)
The Norwegian e-Infrastructure for Life Sciences is the main technical deliverable from the ELIXIR.NO project. NeLS combines:
- 5 national Galaxy installations, providing simple web-based access to commonly used bioinformatics tools and workflows
- A number of ELIXIR.NO approved analysis pipelines, specifically focusing on High Throughput Sequencing applications
- A storage backend that supports data transfer, personal and project areas
- User authentication using FEIDE (for Norwegian academic users) the NeLS idP (for other users)
- A web-based NeLS Portal that works as a central hub for the NeLS solution, with links to the different parts of the system. The NeLS Portal provides access to personal and project storage, user credentials, an admin and help desk functionalities.
- Command-line and programmatic access to the NeLS storage solution
- Data transfer to and from StoreBioInfo (for long-time storage of project data) and Tjenester for Sensitive Data (TSD).
In 2015, I lead the national team within ELIXIR.NO responsible developing NeLS. In 2016, I have stepped down to a sub-leader position. I am currently working together with personnel from Universitetets senter for informasjonsteknologi (USIT) on integrating the UiO Galaxy installation with their LifePortal.