Contributing to large open-source software libraries
The goal of this task is to contribute a new major section on genome analysis to BioPython, which is a large open source software library of biologist-related tools for the Python programming language.
The BioPython library contains a diverse collection of functionality, but handling of genomic data is currently limited. At the same time, recent developments in DNA sequencing technology has made generation and analysis of genome annotation tracks a major focus in both biology and medicine. We do locally have a strong competence both in Python and genome analysis, and we believe a master student situated in our group could have good opportunities to contribute a major section on genome annotation analysis to the BioPython library.
A first aspect of the task is to determine what functionality should be included and how the code should be structured. What is the trade-off between simplicity and flexibility? Should the tool be very geared towards the typical use, or be more open in its support and use? What is the trade-off between ease-of-installation and efficiency? Should one use specific packages, versions and file formats to ensure efficiency, or should one avoid such dependencies?
A second aspect concerns the contribution to distributed open source projects. How does one integrate a section with the remaining of a large software library, and how does one persuade library managers that a new section is useful and its implementation adequate?
The third and main aspect is then to collect and develop the necessary code. The necessary code could be partly extracted from a locally developed system called the Genomic Hyperbrowser, partly collected from existing python packages (e.g. one called BedTools), and partly developed from scratch.
Good programming skills are necessary, and an interest for open-source development is a plus. No prior knowledge of biology is needed.