Oppgaven er ikke lenger tilgjengelig

Developing bioinformatics software to identify and analyse the viral sequences of gut microbes

The microbiome consists of genetic material from microorganisms (small organisms like bacteria, archaea, fungi, yeasts and viruses) found in a sample from a certain environment. The virus fraction is called the virome. Modern DNA sequencing technologies generate large amounts of data and employ advanced computational techniques for biological insights. Mapping of viruses’ species in the human gut will help us and others to better understand the composition and activities of the viruses present in a sample.

Presence of certain viruses and certain gut microbial genes are associated with CRC (colorectal cancer). Studying these and other viruses in a CRC-screening population could reveal associations between the virome and CRC. One part of the human virome consists of viruses having the human cells as hosts. Some of them are even present in the human genome as endogenous viral elements. The remaining part of the human virome is formed by the bacteriophages, viruses using bacteria, as well as some viruses having other prokaryotes or eukaryotes as hosts.

In the ongoing CRC-biome study conducted at the Cancer Registry of Norway, we produce a large amount of sequencing data (metagenomes) from serially collected human gut samples from about 1500 study participants. The samples represent healthy individuals (controls), pre-cancers and cancers (cases), each with on average three gigabases of sequence data.

Analyses will be performed using UiO’s services for sensitive data (TSD).

VIBRANT and VIRSORTER2 are computational tools that utilize a hybrid machine learning and protein similarity approach to identify viral sequences. While the tools themselves are relatively easy to use, researchers often need to use other tools in order to name sequences. The specific aim of the master project is to develop necessary and useful extensions to this softwares using Snakemake and containers for a better identification of the viral sequences in human gut.

Studying these and other viruses in the viromes of a CRC-screening population could reveal associations between the virome and these diseases. While the project has a large potential for biological interpretation, the actual analyses are not dependent on prior biological knowledge, but will be based on algorithms using graph theory and machine learning.

Prior knowledge of scripting in Python and R is required. No prior knowledge of microbiology or cancer is needed.

 

References

Guo J, Bolduc B, Zayed AA, et al. (2021) VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37. doi 10.1186/s40168-020-00990-y

Kieft K, Zhou Z & Anantharaman K (2020) VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90. doi: 10.1186/s40168-020-00867-

Publisert 4. okt. 2021 10:46 - Sist endret 29. nov. 2021 14:29

Veileder(e)

Omfang (studiepoeng)

60