LFG-Based Universal Dependencies for Norwegian

In recent years, several projects have developed syntactic treebanks for Norwegian.  This project will seek to better understand the similarities and differences between the linguistic framework underlying these initiatives, viz. Lexical Functional Grammar (LFG) vs. Universal Dependencies (UD).

Universal Dependencies (UD) is a recent community-driven project to create cross-linguistically consistent syntactic annotation. Efforts are currently being made to adapt a number of existing dependency treebanks to this emerging. The last release of the treebanks contain as many as 33 different languages and treebanks involved in this effort represent a diverse range languages such as English, German, Swedish, Spanish, Italian, Persian, Japanese, etc. The Norwegian Dependency Treebank (NDT) was recently converted to the UD scheme and included among the treebanks in the recent release.

The INESS Treebank provides manually-validated LFG analyses for some 500,000 tokens of running Norwegian text.  These analyses provide two interrelated layers of representation, viz. c(onstituent)-structure and f(unctional)-structure.  Both layers could in principle provide a useful starting point for conversion into bi-lexical dependencies, where one might expect the f-structure layer to be conceptually closer to UD.

The project can be sub-divided into three tasks and, thus, would in principle be available for group work (of no more than two students working and submitting together).  The first sub-task is to develop an automated conversion procedure from INESS c- and f-structures into bi-lexical dependencies; this sub-task could be evaluated in isolation through parsing experiments.  The second sub-task is somewhat more linguistic in nature and would seek to adapt the conversion procedure towards the UD framework; for this work, the existing Norwegian UD Treebank would provide an important point of reference.  To reflect on UD design choices for Norwegian in more depth, a third sub-project would seek to construct fresh, high-quality UD annotations of new test data; such annotation is expected to provide a methodology for discovery of challenging, previously unaddressed linguistic phenomena.

In the academic year 2017–18, the supervisors for this project will be fellows at the Oslo Center for Advanced Studies (CAS), jointly with among others the main designers of the INESS and UD analyses (e.g. Helge Dyvik and Joakim Nivre, respectively).  This setting offers a rare opportunity to conduct an MSc project with direct access to a group of leading international researchers and co-developers of the underlying frameworks.  At the same time, the project presupposes a good balance of technical and syntactic expertise and the ability to collaborate scientifically.  As part of the associated coursework, we will recommend the course (at the Faculty of Humanities) SPR4106: Syntax and Semantics in Formal Terms.

Please contact the supervisors to discuss further details, adaptation of the project to individual background, and possibilities for group work.


Emneord: language technology
