EVOGENE Seminar/CEES Extra Seminar - Plant Genome Sequencing and the Informatics Challenge
Jason Miller, J. Craig Venter Institute
ABSTRACT: Plant science will soon have access to large numbers of nearly complete genome sequences. Recently developed genome reconstruction technologies will allow many genome projects to generate nearly complete and nearly chromosome-length assembly sequences. We Applied several technologies to the moderately-sized, low-heterozygosity genome of Medicago truncatula. In a study of three wild accessions, we applied PacBio’s single-molecule, long-read genome sequencing and uncovered previously-missed tandem repeats involved in gene Family expansion. This was achieved with low-coverage long reads and a new, hierarchical, hybrid assembly pipeline called Alpaca. We also applied high-coverage PacBio, the Falcon assembler, BioNano’s optical mapping technology, and Dovetail’s cross-linked chromatin sequencing technology. On the one accession tested to date, we achieved higher contiguity than either published Medicago truncatula assembly. To address the exciting challenge of interpreting large numbers of nearly complete plant genomes, we are building an informatics Resource centered on Arabidopsis thaliana, the model organism that offers a wealth of genome annotation. The Arabidopsis Information Portal (www.Araport.org) is an open-source community resource where scientists explore gene function through JBrowse genome browsing and InterMine data mining tools. Araport is a community-extensible platform on which plant scientists expose and integrate data and applications while incorporating cloud-based scaling and RESTful data exchange. The Araport project seeks collaborators eager to create the data integration resources that will allow biological discovery to keep pace with genome sequencing.