Oppgaven er ikke lenger tilgjengelig

Benchmarking of software for mapping short sequences to a reference genome

DNA sequencing technology is developing very rapidly. Next Generation Sequencing (NGS), also known as High-Throughput Sequencing (HTS) or Deep sequencing, has revolutionized the speed and cost of DNA sequencing. With the latest machines, one can determine the sequence of nucleotides, the building blocks of DNA, with an extreme speed relative to what could be done only 6 years ago. In the course of two weeks one machine can sequence up to 600 billion base pairs, divided into 6 billion short sequences of 100 base pairs each. For comparison, the entire human genome consists of about 3 billion base pairs. The cost of sequencing has also been reduced dramatically. However, in order to be more widely used clinically, even higher cost-effective solutions are required.

Sequencing may be performed on a DNA sample from a human individual in order to identify the variants present in the individual’s genetic profile as compared to a human reference sequence. When such a sample has been sequenced, all the short sequences have to be mapped back to the correct location on the reference genome. Due to sequencing errors and variation in the genome between human individuals, the short sequences may not match perfectly, making the task of finding the correct location difficult. A large number of programs to map such short sequences against a reference genome have been developed. Examples of such applications are Bowtie, BWA, Novoalign, and SOAP. The quality of the results and speed of such programs varies, and they also have many parameters that can be adjusted.

The aim of this project is to do a thorough comparison of the performance of a variety of such programs to determine which application and which parameters are best suited to different concrete problems.

The task is suitable for anyone with an interest in bioinformatics, who have some programming experience (scripting), and who have some basic knowledge of statistics.

Supervisor: Torbjørn Rognes (BMI/IFI)

Emneord: bioinformatikk, bioinformatics, genom, genome, genomics, benchmarking
Publisert 16. aug. 2013 14:28 - Sist endret 2. okt. 2014 15:31

Veileder(e)

Omfang (studiepoeng)

60