Oppgaven er ikke lenger tilgjengelig

Data-driven dependency parsing of Norwegian

Recently there has been a surge of interest in dependency-based
approaches to syntactic and semantic parsing of natural
language. Data-driven approaches to dependency parsing have been
employed in a number of important NLP-tasks, where one of the
arguments in favor of parsing with dependency representations is that
dependency relations are much closer to the semantic relations which
figure between words in a sentence. There are gold standard data sets
(so-called "treebanks") for a range of different languages, such as
German, Spanish, Czech, and Chinese. Norwegian has in this respect
been an understudied language, as there is no syntactic treebank
available for Norwegian. All of this is, however, about to change, and
a dependency treebank is currently under construction at the National
Library, of which a beta version has already been released.

This project has the aim of arriving at the best possible data-driven
dependency parser for Norwegian. In order to do so, the performance of
existing parser toolkits (such as Maltparser and MSTParser) which can
be trained on the treebank will be contrasted. In particular, the
project will involve an in-depth study of how different linguistic
features, such as person, gender, tense, etc. influences parsing
performance for Norwegian and how these may best be employed during parsing.

Prerequisites for this project are some programming skills, an
interest in linguistic (syntactic) modeling and use of machine
learning software. Details and further specification of the project
can be discussed with Lilja Øvrelid or Arne Skjærholt.

Publisert 12. sep. 2013 15:42 - Sist endret 25. jan. 2016 12:50

Veileder(e)

Lilja Øvrelid Universitetet i Oslo

Data-driven dependency parsing of Norwegian

Veileder(e)

Omfang (studiepoeng)