Developing benchmarks for genome analysis
Several problems in biology depend on computional methods to discover important features. Examples of this are the prediction of where genes are located in a large stretch of DNA sequence, and the prediction of where molecules will attach to DNA in order to control which genes are active and not. As tens or even hundreds of different methods have been proposed for some of these problems, it is often difficult for biologists to know which of the many methods to use on real problems. Benchmarks that allow biologists to compare the performance of alternative methods on data sets where the true answer is known play an important role in guiding such a selection of appropriate methods.
A previous master student in the group have developed a generic system for handling data sets and user interfaces to benchmarks, and another previous master student have looked into evaluation measures for benchmarking.
The goal of the task is to construct a suite of benchmarks that can be used to evaluate the performance of alternative methods for a selection of problems in DNA analysis. The aim would be that the collection can serve as a preferred resource for biologists wanting to evaluate candidate methods to use for their particular problem. It could also be useful for developers of new methods that want to show the strenghts of their new method in relation to previously available methods.
The student could have a background in either computer science or biology. The task requires a basic knowledge of biology (like a single introductory course), or at least an interest for learning about specific problems in biology. Similarly it requires a basic knowledge of programming, or at least an interest for learning programming during the task. The focus of the task could be varied, depending on whether the student have a background in computer science or biology, ensuring that the main competence is exploited for the task.