Creating a highly dynamic benchmarking system
For complex problems, there will often exist several alternative algorithms that each tries a different approach and thus gives different results. In order to assess which algorithms work best for a specific purpose, their prediction performance can be benchmarked. This consists of running each algorithm on a same input data set, comparing the results against a known answer, and then rating the algorithms based on a measure of similarity between predictions and answer. A benchmark is thus basically defined by an input data set with corresponding answer, and should closely reflect a realistic scenario in order to provide a relevant evaluation of the algorithms. This also means that many alternative benchmarks are proposed to match alternative usage scenarios.
The task is to develop a highly dynamic benchmarking system for genomics that streamlines the process of creating concrete benchmarks. This could not only streamline the creation of public benchmarks that new algorithms are assessed by, but could also allow biologists to create their own benchmark that closely reflects their usage needs. Such a system could be used for several different problems in genomics, such as finding locations in DNA where specific proteins bind (either through pattern discovery or peak calling). One should aim for a simple and powerful web-based interface for this, which could see international usage for both defining and applying benchmarks, within several fields of genomcis.
Students should be skilled in programming. No prior knowledge of biology is needed.