Develop machine learning methodology/software to tackle distributional shifts between training and deployment domains
Background and project description
Machine learning (ML) methods hold promise for learning the signal patterns that can explain diseases based on large labeled datasets in a supervised fashion and can become a useful tool in clinical diagnostics. However, the datasets the ML methods are trained on can differ to varying degrees in feature distributions from the datasets the ML models meet in a future context (e.g. in clinical diagnostics). In ML literature, this phenomenon is often referred to as domain shift or distributional shift. Distributional shift would affect the performance of ML models on unseen data. Using a particular biological domain and datasets as use case, the student will explore, understand, and develop approaches to tackle distributional shift using biological data
How will this task be useful in future jobs
In real-world applied ML-based jobs, keeping an eye for distributional shifts and model decays is an essential and common practice. This particular task hones transferable skills in terms of thought process+knowledge that can be useful when applying ML methods on real-world data.
- Study programs: Data Science/Computational Science/Statistics/Informatics/Bioinformatics
- Skills: Good grasp of statistics/machine learning is assumed. Decent programming skills in Python/R is assumed. No biology knowledge is required.