Population genomics of pneumococcus: data to models and back

CEES Extra seminar/Faculty of Dentistry seminar by Bill Hanage from Harvard School of Public Health

Genome sequences, once so rare, are becoming commonplace. In the case of the pneumococcus we now have datasets comprising hundreds or thousands of genomes from well-defined populations. Analysis reveals extensive variation in the rate of recombination, events leading to vaccine escape, and subtle niche partitioning governed by the presence of specific antigens. We can also observe a linear relationship between divergence at core housekeeping loci, and the more flexible accessory genome, with clades in the population being approximately equally divergent in both. Building on previous models of core genome diversification, it is possible to incorporate variation in the accessory genome, and reproduce this relationship. High levels of recombination between major sequence clusters can explain the observed relationship between core and accessory genome divergence and its distribution, without recourse to selection. However, the model performs poorly when considering very closely related strains, suggesting short-term selective processes leading to population bottlenecks or microepidemics.

Bill Hanage
Associate Professor
Harvard School of Public Health
Department of Epidemiology

Published May 8, 2014 9:51 AM - Last modified May 22, 2014 9:46 AM