Machine learning to decipher adaptive immunity
Below I provide a bit of further details regarding the use of machine learning to decipher adaptive immunity. Please contact me if you would want to discuss in further detail.
How is our immune system recognizing foreign threats?
The adaptive immune system is nature’s most finely-tuned defence tool in that it recognize and neutralise with exquisite specificity any harmful particle (anti-gen), such as cancer, virus and bacteria. The immune information on past and ongoing immune responses is recorded in genetic sequence information of a myriad immune cells, known as immunogenomic memory. This information may serve as a biomarker across a broad spectrum of immune states (e.g., health, disease, infection, vaccination) and is thus key to the development of next-generation diagnostics and therapeutics.
We have billions of different immune cells in our body that each have a responsibility to detect a particular foreign threat. What an adaptive immune cell will recognize is determined by a very short protein sequence - essentially an ~15 character long text referred to as the "immune receptor". The fact that billions of different virus and bacteria can be uniquely recognized means that there must be strong non-linearities and high-order interactions in the function that maps an ~15 long vector to a unique recognition characteristic. Furthermore, the immune response of a given person is an emergent property resulting from the billions of immune cells in the body. Each person has a mostly unique set of immune receptors that we refer to as the "immune repertoire".
How are we trying to decipher this immune recognition through machine learning?
The detection of disease signals in immune repertoires (to diagnose disease based on a blood sample) belongs to a particularly challenging class of machine learning problems called Multiple instance learning. This is a form of weakly supervised learning where labels are provided only at the level of bags of assorted training instances. Immune repertoire classification is an ideal example of Multiple instance learning, where a given disease state is driven by a small unknown subset of immune cells of a patient. The problem is furthermore a multi-label multiple instance problem, as the immune repertoire of a person will contain a myriad of immune cell subsets corresponding to a lifetime of vaccines, pathogen encounters and more.
We are a large research team working on the development of machine learning methodology for this problem. We offer several specific tasks, related to how prior information from the domain can be used to guide deep learning models, related to problem characteristics and interpretability options for multiple instance learning, related to how one can improve generalization of machine learning models in settings of domain shifts and confounding factors, on combining low-dimensional and high-dimensional data in machine learning, and more. We are interested in a variety of machine learning approaches, with a particular interest in probabilistic graphical models due to their ability to integrate information, and deep neural networks due to their flexibility in combining various mechanisms and types of layers in a modularized way. If you are interested in a task connected to our research environment, we would go more into the details of specific tasks according to your background and interests.
Why do we consider domain-tailored software platforms for machine learning to be so important?
Machine learning has lately received high interest both in research and industry. Modern machine learning libraries like TensorFlow and PyTorch have substantially lowered the barrier of applying advanced machine learning methods to large datasets. Nonetheless, the successful application of such general methods in almost any given domain still relies on extensive tailoring to particularities of the problem of interest and the data that are available. Thus, a successful machine learning analysis typically requires a considerable implementation effort. Furthermore, successful machine learning is seldom a one-short effort - typically the result of one analyses leads to increased understanding that again inspires new approaches, new methods and the application to new data. There is such a large potential to accelerate work in both research and industry by reusing code along such a development trajectory - by continually reusing, modifying, extending and accumulating functionality for data processing, data exploration and domain-tailored machine learning analyses.
We have at the University of Oslo for more than three years developed a platform to allow such a form of reuse and accumulation of functionality for machine learning analyses within a particular domain. Our interest is in the development of machine learning methodologies to decipher the workings of the adaptive immune system. The immune system stores past and ongoing immune responses as genetic sequence information in a myriad immune cells, known as immunogenomic memory. This information may serve as a foundation for next-generation diagnostics and therapeutics for a variety of diseases (infection, autoimmune diseases and cancer) and is thus an area of active research. The codebase we have developed streamlines machine learning analyses in this domain, by collecting a broad range of functionality for processing immune cell data, tailored machine learning models for immune-based diagnostics and therapeutics, as well as providing a custom specification language for such analyses on immune data.
Our platform is still under active development, and we are facing a range of software challenges related to its further improvement. We thus have a range of more specific programming/software engineering tasks connected to the platform, related to amongst others how we can streamline and improve efficiency of parallelization, how analyses can be automatically distributed to available GPU resources, how the language to express machine learning analyses can be made more expressive and easier to use, related to extending the platform with new options for data inspection, visualization or machine learning model interpretability. If you are interested in a task connected to our research environment, we would go more into the details of specific tasks according to your background and interests.
What do we expect from you as a master student?
Master students working on our tasks would all be expected to be part of our research team - working on problems at the front of the research field, and collaborating with researchers and other master students in our team. The tasks are thus challenging, but given a hard and structured working attitude we believe they represent a good learning opportunity and that they may feel very rewarding.