New publication: Preferential amplification of repetitive DNA during whole genome sequencing library creation from historic samples
Repetitive microsatellite DNA forms a universal component of eukaryote genomes and specific biochemical properties of such repeat regions may influence the outcome of laboratory protocols. The Atlantic cod (Gadus morhua) genome contains an order of magnitude more dinucleotide repeats than the majority of vertebrates, with over eight percent of its genome that can be classified as either AC or AG dinucleotide repeat. We find that the abundance of these repeats can be inflated in ancient DNA (aDNA) whole genome sequencing (WGS) data generated from this species, in particular in samples with a lower fragment length. This inflation is suppressed by a reduced number of amplification cycles and by the inclusion of manufactured dinucleotide repeat oligonucleotides during amplification. These data indicate that a biased amplification reaction leads to artificially high levels of AC and AG repeats. This process appears to be particularly efficient in Atlantic cod – likely due to its high genomic content of repeats with relatively simple sequence complexity. While the extend of such bias in other studies is unclear, we nonetheless urge caution when quantifying repeat content in aDNA WGS data, given that amplification bias can be difficult to detect if this process affects more complex repeat structures than dinucleotide repeats.