Clustering cancer omics data with NMF


As next-generation sequencing technology has made sequencing of multiple molecular data types such as the whole genome of an individual, gene expression, and DNA methylation, personalised medicine is the present and future of healthcare. Through the analysis of these multiple layers of molecular data available for the same patient, we have an unprecedented opportunity to uncover the causes of cancer, better stratify patients for cancer treatments, and reveal new biomarkers and therapeutic candidates.

Non-negative matrix factorization (NMF) represents a machine learning approach for dimensionality reduction and unsupervised clustering. Specifically, the algorithm factorizes a matrix M into two matrices containing non-negative elements. Through this process, the columns of M are intrinsically clustered, in an unsupervised manner. It makes this approach very appealing to better characterize cancer subtypes from the breadth of data available publicly (from The Cancer Genome Atlas, TCGA) and in house at the Oslo University Hospital. The purpose of this Master thesis project is to integrate the multiple layers of molecular data available for cancer patients using the NMF approach to cluster patients and identify the underlying molecular basis of the defined clusters. The selected candidate will develop the computational tool dedicated to this task and will apply it to publicly available data from TCGA as well as in house data. A specific focus will be given to breast cancer with in house data available.

Advantageously, we have expertise in the in depth analyses of high-throughput ‘omics data from cancer patients as well as on the use of the NMF approach, providing an optimal learning environment to the selected student. During the course of the project, the student will be exposed to machine learning (NMF) and computational approaches for the management, analyses, and interpretation of large-scale, next generation sequencing data. We seek a highly motivated individual preferably with programming skills and knowledge of computational tools development. Knowledge in statistical methods and/or a biological background is a plus.

We are looking for applicants excited about combining life sciences and computation. The candidate will be co-supervised by Dr. Ole-Christian Lingjærde and Dr. Anthony Mathelier. The supervisors have strong expertise in computer science and biology and are affiliated to the University of Oslo, the Oslo University Hospital, and the Centre for Molecular Medicine Norway. The student will be collaborating with researchers at the Oslo University Hospital.


Emneord: Cancer, Clustering, Omics, Machine learning
Publisert 24. sep. 2018 10:40 - Sist endret 24. sep. 2018 10:40

Omfang (studiepoeng)