De novo genome assembly with the Celera Assembler software

Friday seminar by Jason R. Miller from J. Craig Venter Institute

Abstract

High throughput DNA sequencing technology makes genome sequencing an affordable tool for studying biological function and evolutionary history.
The technology is limited to delivering relatively short reads of contiguous DNA base pairs. Genome assembly software exists to overcome short read lengths by inferring longer contig and scaffold sequences from the reads. Celera Assembler is an open-source assembler that starts by computing alignments between read pairs. With this overlap-based approach, Celera Assembler is well-positioned to exploit longer reads so as to overcome sequencing error, donor polymorphism, and genome repetitiveness. Modern genome projects have to deal with shorter reads in high volume, heterogeneity and redundancy of e.g. Illumina mate pair libraries, and high error in e.g. PacBio long reads. We will discuss recent software modifications that enable Celera Assembler to incorporate these data advantageously.

Jason Miller, Assistant Professor, J. Craig Venter Institute

Published June 16, 2012 8:23 AM - Last modified June 16, 2012 8:24 AM