Research aims
Develop rapid and widely useful tools for sequence analysis by exploiting parallelism, smart data structures and effective algorithms. In particular comparison, searching, clustering and classification of sequences.
Projects
- Tools for microbiome sequencing data analysis (VSEARCH, Swarm)
- Tool for comparison of adaptive immune receptor repertoires (CompAIRR)
- Tool for rapid sequence comparison (SWIPE)
- Publication database and associated statistics (Publika)
Research topics
Software
- VSEARCH: a versatile open-source tool for metagenomics
- Swarm: highly-scalable and high-resolution amplicon clustering
- CompAIRR: Comparison of Adaptive Immune Receptor Repertoires
Publications
Publications 2024
Altered Genome-Wide DNA Methylation in the Duodenum of Common Variable Immunodeficiency Patients
J Clin Immunol, 44 (6), 133
DOI 10.1007/s10875-024-01726-5, PubMed 38780872
Topology-guided polar ordering of collective cell migration
Sci Adv, 10 (16), eadk4825
DOI 10.1126/sciadv.adk4825, PubMed 38630812
Exploring the gut DNA virome in fecal immunochemical test stool samples reveals associations with lifestyle in a large population-based study
Nat Commun, 15 (1), 1791
DOI 10.1038/s41467-024-46033-0, PubMed 38424056
Publications 2023
HoCoRT: host contamination removal tool
BMC Bioinformatics, 24 (1), 371
DOI 10.1186/s12859-023-05492-w, PubMed 37784008
hGSuite HyperBrowser: A web-based toolkit for hierarchical metadata-informed analysis of genomic tracks
PLoS One, 18 (7), e0286330
DOI 10.1371/journal.pone.0286330, PubMed 37467208
Publications 2022
MoDLE: high-performance stochastic modeling of DNA loop extrusion interactions
Genome Biol, 23 (1), 247
DOI 10.1186/s13059-022-02815-7, PubMed 36451166
CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
Bioinformatics, 38 (17), 4230-4232
DOI 10.1093/bioinformatics/btac505, PubMed 35852318
Publications 2021
Swarm v3: towards tera-scale amplicon clustering
Bioinformatics, 38 (1), 267-269
DOI 10.1093/bioinformatics/btab493, PubMed 34244702
The CRCbiome study: a large prospective cohort study examining the role of lifestyle and the gut microbiome in colorectal cancer screening participants
BMC Cancer, 21 (1), 930
DOI 10.1186/s12885-021-08640-8, PubMed 34407780
Exploring the role of the multiple sclerosis susceptibility gene CLEC16A in T cells
Scand J Immunol, 94 (1), e13050
DOI 10.1111/sji.13050, PubMed 34643957
Reduced metagenome sequencing for strain-resolution taxonomic profiles
Microbiome, 9 (1), 79
DOI 10.1186/s40168-021-01019-8, PubMed 33781324
Erratum: The uracil-DNA glycosylase UNG protects the fitness of normal and cancer B cells expressing AID
NAR Cancer, 3 (1), zcaa045
DOI 10.1093/narcan/zcaa045, PubMed 34316697
Publications 2020
HMST-Seq-Analyzer: A new python tool for differential methylation and hydroxymethylation analysis in various DNA methylation sequencing data
Comput Struct Biotechnol J, 18, 2877-2889
DOI 10.1016/j.csbj.2020.09.038, PubMed 33163148
The uracil-DNA glycosylase UNG protects the fitness of normal and cancer B cells expressing AID
NAR Cancer, 2 (3), zcaa019
DOI 10.1093/narcan/zcaa019, PubMed 33554121
NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads
BMC Bioinformatics, 21 (1), 66
DOI 10.1186/s12859-020-3414-0, PubMed 32085722
Publications 2017
Uracil Accumulation and Mutagenesis Dominated by Cytosine Deamination in CpG Dinucleotides in Mice Lacking UNG and SMUG1
Sci Rep, 7 (1), 7199
DOI 10.1038/s41598-017-07314-5, PubMed 28775312
NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
BMC Bioinformatics, 18 (1), 338
DOI 10.1186/s12859-017-1748-z, PubMed 28701187
Publications 2016
VSEARCH: a versatile open source tool for metagenomics
PeerJ, 4, e2584
DOI 10.7717/peerj.2584, PubMed 27781170
The Mycobacterium tuberculosis transcriptional landscape under genotoxic stress
BMC Genomics, 17 (1), 791
DOI 10.1186/s12864-016-3132-1, PubMed 27724857
Open-Source Sequence Clustering Methods Improve the State Of the Art
mSystems, 1 (1)
DOI 10.1128/mSystems.00003-15, PubMed 27822515
cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data
BMC Genomics, 17, 51
DOI 10.1186/s12864-016-2374-2, PubMed 26764020
Publications 2015
Swarm v2: highly-scalable and high-resolution amplicon clustering
PeerJ, 3, e1420
DOI 10.7717/peerj.1420, PubMed 26713226
Transcriptome analysis of human OXR1 depleted cells reveals its role in regulating the p53 signaling pathway
Sci Rep, 5, 17409
DOI 10.1038/srep17409, PubMed 26616534
Non-homologous functions of the AlkB homologs
J Mol Cell Biol, 7 (6), 494-504
DOI 10.1093/jmcb/mjv029, PubMed 26003568
Publications 2014
Swarm: robust and fast clustering method for amplicon-based studies
PeerJ, 2, e593
DOI 10.7717/peerj.593, PubMed 25276506
Normalization of RNA-sequencing data from samples with varying mRNA levels
PLoS One, 9 (2), e89158
DOI 10.1371/journal.pone.0089158, PubMed 24586560
Publications 2013
Tiling array study of MNNG treated Escherichia coli reveals a widespread transcriptional response
Sci Rep, 3, 3053
DOI 10.1038/srep03053, PubMed 24157950
A new family of proteins related to the HEAT-like repeat DNA glycosylases with affinity for branched DNA structures
J Struct Biol, 183 (1), 66-75
DOI 10.1016/j.jsb.2013.04.007, PubMed 23623903
Single transmembrane peptide DinQ modulates membrane-dependent activities
PLoS Genet, 9 (2), e1003260
DOI 10.1371/journal.pgen.1003260, PubMed 23408903
Evolutionary paths of the cAMP-dependent protein kinase (PKA) catalytic subunits
PLoS One, 8 (4), e60935
DOI 10.1371/journal.pone.0060935, PubMed 23593352
Publications 2012
ALKBH1 is a histone H2A dioxygenase involved in neural differentiation
Stem Cells, 30 (12), 2672-82
DOI 10.1002/stem.1228, PubMed 22961808
Alkbh1 and Tzfp repress a non-repeat piRNA cluster in pachytene spermatocytes
Nucleic Acids Res, 40 (21), 10950-63
DOI 10.1093/nar/gks839, PubMed 22965116
Publications 2011
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation
BMC Bioinformatics, 12, 221
DOI 10.1186/1471-2105-12-221, PubMed 21631914
The ada operon of Mycobacterium tuberculosis encodes two DNA methyltransferases for inducible repair of DNA alkylation damage
DNA Repair (Amst), 10 (6), 595-602
DOI 10.1016/j.dnarep.2011.03.007, PubMed 21570366
Publications 2010
Schizosaccharomyces pombe encodes a mutated AP endonuclease 1
DNA Repair (Amst), 10 (3), 296-305
DOI 10.1016/j.dnarep.2010.11.014, PubMed 21193357
Tiling array analysis of UV treated Escherichia coli predicts novel differentially expressed small peptides
PLoS One, 5 (12), e15356
DOI 10.1371/journal.pone.0015356, PubMed 21203457
Continuous and periodic expansion of CAG repeats in Huntington's disease R6/1 mice
PLoS Genet, 6 (12), e1001242
DOI 10.1371/journal.pgen.1001242, PubMed 21170307
Mice lacking Alkbh1 display sex-ratio distortion and unilateral eye defects
PLoS One, 5 (11), e13827
DOI 10.1371/journal.pone.0013827, PubMed 21072209
A two-tiered compensatory response to loss of DNA repair modulates aging and stress response pathways
Aging (Albany NY), 2 (3), 133-59
DOI 10.18632/aging.100127, PubMed 20382984
Publications 2009
The disruptive positions in human G-quadruplex motifs are less polymorphic and more conserved than their neutral counterparts
Nucleic Acids Res, 37 (17), 5749-56
DOI 10.1093/nar/gkp590, PubMed 19617376
Custom design and analysis of high-density oligonucleotide bacterial tiling microarrays
PLoS One, 4 (6), e5943
DOI 10.1371/journal.pone.0005943, PubMed 19536279
Genome dynamics in major bacterial pathogens
FEMS Microbiol Rev, 33 (3), 453-70
DOI 10.1111/j.1574-6976.2009.00173.x, PubMed 19396949
DNA repair in mammalian cells: Base excision repair: the long and short of it
Cell Mol Life Sci, 66 (6), 981-93
DOI 10.1007/s00018-009-8736-z, PubMed 19153658
Large-scale inference of the point mutational spectrum in human segmental duplications
BMC Genomics, 10, 43
DOI 10.1186/1471-2164-10-43, PubMed 19161616
A universal assay for detection of oncogenic fusion transcripts by oligo microarray analysis
Mol Cancer, 8, 5
DOI 10.1186/1476-4598-8-5, PubMed 19152679
Publications 2008
Characterization of novel mutations in the catalytic domain of the PCSK9 gene
J Intern Med, 263 (4), 420-31
DOI 10.1111/j.1365-2796.2007.01915.x, PubMed 18266662
Publications 2007
Slip slidin' away: a duodecennial review of targeted genes in mismatch repair deficient colorectal cancer
Crit Rev Oncog, 13 (3), 229-57
DOI 10.1615/critrevoncog.v13.i3.20, PubMed 18298386
RNAmmer: consistent and rapid annotation of ribosomal RNA genes
Nucleic Acids Res, 35 (9), 3100-8
DOI 10.1093/nar/gkm160, PubMed 17452365
Structural insight into repair of alkylated DNA by a new superfamily of DNA glycosylases comprising HEAT-like repeats
Nucleic Acids Res, 35 (7), 2451-9
DOI 10.1093/nar/gkm039, PubMed 17395642
Publications 2006
Computational prediction of the effects of non-synonymous single nucleotide polymorphisms in human DNA repair genes
Neuroscience, 145 (4), 1273-9
DOI 10.1016/j.neuroscience.2006.09.004, PubMed 17055652
A new protein superfamily includes two novel 3-methyladenine DNA glycosylases from Bacillus cereus, AlkC and AlkD
Mol Microbiol, 59 (5), 1602-9
DOI 10.1111/j.1365-2958.2006.05044.x, PubMed 16468998
Computational prediction of microRNAs encoded in viral and other genomes
J Biomed Biotechnol, 2006 (4), 95270
DOI 10.1155/JBB/2006/95270, PubMed 17057374
Publications 2005
PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology
Nucleic Acids Res, 33 (Web Server issue), W535-9
DOI 10.1093/nar/gki423, PubMed 15980529
Publications 2001
ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches
Nucleic Acids Res, 29 (7), 1647-52
DOI 10.1093/nar/29.7.1647, PubMed 11266569
Publications 2000
Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors
Bioinformatics, 16 (8), 699-706
DOI 10.1093/bioinformatics/16.8.699, PubMed 11099256
Publications 1998
SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments
Bioinformatics, 14 (10), 839-45
DOI 10.1093/bioinformatics/14.10.839, PubMed 9927712