The main lab web page is at: sandvelab.org
Research aims
- Through several ongoing projects and collaborations, we aim to delineate and model how the immune receptor sequence determines what a given adaptive immune cell is recognizing. This is approached by characterizing statistical dependencies and compositional features of receptor sequences, and using this to guide the development of machine learning methods for detecting disease states of a patient's immune repertoire as an early diagnostic. We have developed an extensive software platform for machine learning analysis of immune receptors called immuneML.
Research topics
- Immune receptor machine learning (Doctor AI, ImmunoLingo)
- Causality, machine learning and epidemiology (RealArt, PharmaTox)
- Graph-based genome representation (CELS)
People
- Geir Kjetil Sandve (PI)
- Ivar Grytten (Postdoc)
- Enrico Riccardi (Postdoc)
- Knut Rand (Postdoc)
- Mostafa Alwash (Postdoc)
- Milena Pavlovic (PhD student)
- Lonneke Scheffer (PhD student)
- Ghadi Al Hajj (PhD student)
Software
Publications
Publications 2024
Linguistics-based formalization of the antibody language as a basis for antibody language models
Nat Comput Sci (in press)
DOI 10.1038/s43588-024-00642-3, PubMed 38877120
Identification of Transcripts with Shared Roles in the Pathogenesis of Postmenopausal Osteoporosis and Cardiovascular Disease
Int J Mol Sci, 25 (10)
DOI 10.3390/ijms25105554, PubMed 38791593
Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets
Stat Appl Genet Mol Biol, 23 (1)
DOI 10.1515/sagmb-2023-0027, PubMed 38563699
Biopsy Proteome Scoring to Determine Mucosal Remodeling in Celiac Disease
Gastroenterology (in press)
DOI 10.1053/j.gastro.2024.03.006, PubMed 38467384
Publications 2023
Machine learning-driven development of a disease risk score for COVID-19 hospitalization and mortality: a Swedish and Norwegian register-based study
Front Public Health, 11, 1258840
DOI 10.3389/fpubh.2023.1258840, PubMed 38146473
ANDA: an open-source tool for automated image analysis of in vitro neuronal cells
BMC Neurosci, 24 (1), 56
DOI 10.1186/s12868-023-00826-z, PubMed 37875799
Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
Bioinformatics, 39 (10)
DOI 10.1093/bioinformatics/btad610, PubMed 37802917
hGSuite HyperBrowser: A web-based toolkit for hierarchical metadata-informed analysis of genomic tracks
PLoS One, 18 (7), e0286330
DOI 10.1371/journal.pone.0286330, PubMed 37467208
Artificial intelligence-driven prediction of COVID-19-related hospitalization and death: a systematic review
Front Public Health, 11, 1183725
DOI 10.3389/fpubh.2023.1183725, PubMed 37408750
Effects of prenatal exposure to (es)citalopram and maternal depression during pregnancy on DNA methylation and child neurodevelopment
Transl Psychiatry, 13 (1), 149
DOI 10.1038/s41398-023-02441-2, PubMed 37147306
ANDA: An open-source tool for automated image analysis of neuronal differentiation
bioRxiv
DOI 10.1101/2023.04.27.538564, PubMed 37162841
DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation
PLoS One, 18 (4), e0284443
DOI 10.1371/journal.pone.0284443, PubMed 37058511
Identification of gluten T cell epitopes driving celiac disease
Sci Adv, 9 (4), eade5800
DOI 10.1126/sciadv.ade5800, PubMed 36696493
Publications 2022
simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods
Gigascience, 12
DOI 10.1093/gigascience/giad074, PubMed 37848619
Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for antibody specificity prediction
Nat Comput Sci, 2 (12), 845-865
DOI 10.1038/s43588-022-00372-4, PubMed 38177393
Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking
Bioinformatics, 38 (21), 4994-4996
DOI 10.1093/bioinformatics/btac612, PubMed 36073940
KAGE: fast alignment-free graph-based genotyping of SNPs and short indels
Genome Biol, 23 (1), 209
DOI 10.1186/s13059-022-02771-2, PubMed 36195962
CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
Bioinformatics, 38 (17), 4230-4232
DOI 10.1093/bioinformatics/btac505, PubMed 35852318
Reference-based comparison of adaptive immune receptor repertoires
Cell Rep Methods, 2 (8), 100269
DOI 10.1016/j.crmeth.2022.100269, PubMed 36046619
Low reliability of DNA methylation across Illumina Infinium platforms in cord blood: implications for replication studies and meta-analyses of prenatal exposures
Clin Epigenetics, 14 (1), 80
DOI 10.1186/s13148-022-01299-3, PubMed 35765087
Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification
Gigascience, 11
DOI 10.1093/gigascience/giac046, PubMed 35639633
TCRpower: quantifying the detection power of T-cell receptor sequencing with a novel computational pipeline calibrated by spike-in sequences
Brief Bioinform, 23 (2)
DOI 10.1093/bib/bbab566, PubMed 35062022
In silico proof of principle of machine learning-based antibody design at unconstrained scale
MAbs, 14 (1), 2031482
DOI 10.1080/19420862.2022.2031482, PubMed 35377271
Publications 2021
Individualized VDJ recombination predisposes the available Ig sequence space
Genome Res, 31 (12), 2209-2224
DOI 10.1101/gr.275373.121, PubMed 34815307
The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires
Nat Mach Intell, 3 (11), 936-944
DOI 10.1038/s42256-021-00413-z, PubMed 37396030
Differential expression profile of gluten-specific T cells identified by single-cell RNA-seq
PLoS One, 16 (10), e0258029
DOI 10.1371/journal.pone.0258029, PubMed 34618841
Chromatin occupancy and target genes of the haematopoietic master transcription factor MYB
Sci Rep, 11 (1), 9008
DOI 10.1038/s41598-021-88516-w, PubMed 33903675
Comprehensive Analysis of CDR3 Sequences in Gluten-Specific T-Cell Receptors Reveals a Dominant R-Motif and Several New Minor Motifs
Front Immunol, 12, 639672
DOI 10.3389/fimmu.2021.639672, PubMed 33927715
A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding
Cell Rep, 34 (11), 108856
DOI 10.1016/j.celrep.2021.108856, PubMed 33730590
Ten simple rules for quick and dirty scientific programming
PLoS Comput Biol, 17 (3), e1008549
DOI 10.1371/journal.pcbi.1008549, PubMed 33705383
Editorial: Genomic Colocalization and Enrichment Analyses
Front Genet, 11, 617876
DOI 10.3389/fgene.2020.617876, PubMed 33574832
Publications 2020
T cell receptor repertoire as a potential diagnostic marker for celiac disease
Clin Immunol, 222, 108621
DOI 10.1016/j.clim.2020.108621, PubMed 33197618
Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis
Brief Bioinform, 21 (5), 1523-1530
DOI 10.1093/bib/bbz083, PubMed 31624847
immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking
Bioinformatics, 36 (11), 3594-3596
DOI 10.1093/bioinformatics/btaa158, PubMed 32154832
Assessing graph-based read mappers against a baseline approach highlights strengths and weaknesses of current methods
BMC Genomics, 21 (1), 282
DOI 10.1186/s12864-020-6685-y, PubMed 32252628
Author Correction: Human somatic cell mutagenesis creates genetically tractable sarcomas
Nat Genet, 52 (4), 464
DOI 10.1038/s41588-020-0589-2, PubMed 32094913
NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads
BMC Bioinformatics, 21 (1), 66
DOI 10.1186/s12859-020-3414-0, PubMed 32085722
B cell tolerance and antibody production to the celiac disease autoantigen transglutaminase 2
J Exp Med, 217 (2)
DOI 10.1084/jem.20190860, PubMed 31727780
Publications 2019
A map of direct TF-DNA interactions in the human genome
Nucleic Acids Res, 47 (14), 7715
DOI 10.1093/nar/gkz582, PubMed 31251803
Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires
Mol. Syst. Des. Eng., 4 (4), 701-736
DOI 10.1039/c9me00071b
Transcriptional profiling of human intestinal plasma cells reveals effector functions beyond antibody production
United European Gastroenterol J, 7 (10), 1399-1407
DOI 10.1177/2050640619862461, PubMed 31839965
Colocalization analyses of genomic elements: approaches, recommendations and challenges
Bioinformatics, 35 (9), 1615-1624
DOI 10.1093/bioinformatics/bty835, PubMed 30307532
A map of direct TF-DNA interactions in the human genome
Nucleic Acids Res, 47 (4), e21
DOI 10.1093/nar/gky1210, PubMed 30517703
Graph Peak Caller: Calling ChIP-seq peaks on graph-based reference genomes
PLoS Comput Biol, 15 (2), e1006731
DOI 10.1371/journal.pcbi.1006731, PubMed 30779737
Publications 2018
Mind the gaps: overlooking inaccessible regions confounds statistical testing in genome analysis
BMC Bioinformatics, 19 (1), 481
DOI 10.1186/s12859-018-2438-1, PubMed 30547739
Exploiting antigen receptor information to quantify index switching in single-cell transcriptome sequencing experiments
PLoS One, 13 (12), e0208484
DOI 10.1371/journal.pone.0208484, PubMed 30517183
Coloc-stats: a unified web interface to perform colocalization analysis of genomic features
Nucleic Acids Res, 46 (W1), W186-W193
DOI 10.1093/nar/gky474, PubMed 29873782
Disease-driving CD4+ T cell clonotypes persist for decades in celiac disease
J Clin Invest, 128 (6), 2642-2650
DOI 10.1172/JCI98819, PubMed 29757191
Publications 2017
Complex patterns of concomitant medication use: A study among Norwegian women using paracetamol during pregnancy
PLoS One, 12 (12), e0190101
DOI 10.1371/journal.pone.0190101, PubMed 29284043
Genome build information is an essential part of genomic track files
Genome Biol, 18 (1), 175
DOI 10.1186/s13059-017-1312-1, PubMed 28911336
Uracil Accumulation and Mutagenesis Dominated by Cytosine Deamination in CpG Dinucleotides in Mice Lacking UNG and SMUG1
Sci Rep, 7 (1), 7199
DOI 10.1038/s41598-017-07314-5, PubMed 28775312
NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences
BMC Bioinformatics, 18 (1), 338
DOI 10.1186/s12859-017-1748-z, PubMed 28701187
GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome
Gigascience, 6 (7), 1-12
DOI 10.1093/gigascience/gix032, PubMed 28459977
High-Throughput Single-Cell Analysis of B Cell Receptor Usage among Autoantigen-Specific Plasma Cells in Celiac Disease
J Immunol, 199 (2), 782-791
DOI 10.4049/jimmunol.1700169, PubMed 28600290
The rainfall plot: its motivation, characteristics and pitfalls
BMC Bioinformatics, 18 (1), 264
DOI 10.1186/s12859-017-1679-8, PubMed 28521741
Coordinates and intervals in graph-based reference genomes
BMC Bioinformatics, 18 (1), 263
DOI 10.1186/s12859-017-1678-9, PubMed 28521770
Publications 2016
Galaxy Portal: interacting with the galaxy platform through mobile devices
Bioinformatics, 32 (11), 1743-5
DOI 10.1093/bioinformatics/btw042, PubMed 26819474
Publications 2015
In the loop: promoter-enhancer interactions and bioinformatics
Brief Bioinform, 17 (6), 980-995
DOI 10.1093/bib/bbv097, PubMed 26586731
c-Myb Binding Sites in Haematopoietic Chromatin Landscapes
PLoS One, 10 (7), e0133280
DOI 10.1371/journal.pone.0133280, PubMed 26208222
ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets
PLoS One, 10 (4), e0123261
DOI 10.1371/journal.pone.0123261, PubMed 25879845
EBNA2 binds to genomic intervals associated with multiple sclerosis and overlaps with vitamin D receptor occupancy
PLoS One, 10 (4), e0119605
DOI 10.1371/journal.pone.0119605, PubMed 25853421
Transcriptionally active regions are the preferred targets for chromosomal HPV integration in cervical carcinogenesis
PLoS One, 10 (3), e0119566
DOI 10.1371/journal.pone.0119566, PubMed 25793388
Monte Carlo Null Models for Genomic Data
Stat. Sci., 30 (1), 59-71
DOI 10.1214/14-STS484
Publications 2014
Human somatic cell mutagenesis creates genetically tractable sarcomas
Nat Genet, 46 (9), 964-72
DOI 10.1038/ng.3065, PubMed 25129143
Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines
BMC Genomics, 15, 120
DOI 10.1186/1471-2164-15-120, PubMed 24669905
HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization
Bioinformatics, 30 (11), 1620-2
DOI 10.1093/bioinformatics/btu082, PubMed 24511080
Publications 2013
Integrating multiple oestrogen receptor alpha ChIP studies: overlap with disease susceptibility regions, DNase I hypersensitivity peaks and gene expression
BMC Med Genomics, 6, 45
DOI 10.1186/1755-8794-6-45, PubMed 24171864
DNase hypersensitive sites and association with multiple sclerosis
Hum Mol Genet, 23 (4), 942-8
DOI 10.1093/hmg/ddt489, PubMed 24092328
Ten simple rules for reproducible computational research
PLoS Comput Biol, 9 (10), e1003285
DOI 10.1371/journal.pcbi.1003285, PubMed 24204232
Vitamin D receptor ChIP-seq in primary CD4+ cells: relationship to serum 25-hydroxyvitamin D levels and autoimmune disease
BMC Med, 11, 163
DOI 10.1186/1741-7015-11-163, PubMed 23849224
The Genomic HyperBrowser: an analysis web server for genome-scale data
Nucleic Acids Res, 41 (Web Server issue), W133-41
DOI 10.1093/nar/gkt342, PubMed 23632163
Handling realistic assumptions in hypothesis testing of 3D co-localization of genomic elements
Nucleic Acids Res, 41 (10), 5164-74
DOI 10.1093/nar/gkt227, PubMed 23571755
Publications 2012
Vitamin D receptor binding, chromatin states and association with multiple sclerosis
Hum Mol Genet, 21 (16), 3575-86
DOI 10.1093/hmg/dds189, PubMed 22595971
Age-associated hyper-methylated regions in the human brain overlap with bivalent chromatin domains
PLoS One, 7 (9), e43840
DOI 10.1371/journal.pone.0043840, PubMed 23028473
Genomic regions associated with multiple sclerosis are active in B cells
PLoS One, 7 (3), e32281
DOI 10.1371/journal.pone.0032281, PubMed 22396755
Publications 2011
Identifying elemental genomic track types and representing them uniformly
BMC Bioinformatics, 12, 494
DOI 10.1186/1471-2105-12-494, PubMed 22208806
Sequential Monte Carlo multiple testing
Bioinformatics, 27 (23), 3235-41
DOI 10.1093/bioinformatics/btr568, PubMed 21998154
Increased expression of IRF4 and ETS1 in CD4+ cells from patients with intermittent allergic rhinitis
Allergy, 67 (1), 33-40
DOI 10.1111/j.1398-9995.2011.02707.x, PubMed 21919915
The differential disease regulome
BMC Genomics, 12, 353
DOI 10.1186/1471-2164-12-353, PubMed 21736759
Publications 2010
The Genomic HyperBrowser: inferential genomics at the sequence level
Genome Biol, 11 (12), R121
DOI 10.1186/gb-2010-11-12-r121, PubMed 21182759
Publications 2008
Segmentation of DNA sequences into twostate regions and melting fork regions
J Phys Condens Matter, 21 (3), 034109
DOI 10.1088/0953-8984/21/3/034109, PubMed 21817254
Compo: composite motif discovery using discrete models
BMC Bioinformatics, 9, 527
DOI 10.1186/1471-2105-9-527, PubMed 19063744