Extraction of biomedical knowledge from the ClinicalTrials database
A major challenge for biomedical researchers is to stay updated with the
latest findings and discoveries within their field of interest.
Typically, one is interested in the biological functions of a specific
gene, the drugs used in treatment for a given disease, or the genetic
variation associated with a given disease. Much of this information
exists as unstructured or raw text in the scientific literature. There
is thus a need to develop technologies and algorithms that perform
automatic extraction of meaningful biomedical associations (i.e., such
as those exemplified above) from text.
This project will explore the content of the ClinicalTrials database
(clinicaltrials.gov), which is a database with descriptions of more than
100,000 different trials that each study the impact of treatments or
drugs on a collection of humans with a particular disease. Specifically,
the project will use algorithms that find known biomedical entities
(i.e. drugs,diseases,genes etc.) within the ClinicalTrials textual
content, and also measure the strength of their relationships. The focus
will be on drug-disease relationships, and results could be compared
with those obtained by analyzing MEDLINE, a global database of the
The project requires basic programming skills and experience with
high-level scripting languages such as Python or Perl. An interest in
information retrieval, text mining and/or biomedicine would be beneficial.
The student will collaborate with developers in PubGene (pubgene.com), a
small Oslo-based company that has pioneered biomedical literature mining.
Supervisors: Sigve Nakken (OUS, PubGene), Tor-Kristian Jenssen (PubGene)
and Eivind Hovig (UiO, OUS)