Universal Dependencies for a Norwegian Learner Corpus

Universal Dependencies (UD) is a recent community-driven project to create cross-linguistically consistent syntactic annotation. Efforts are currently being made to adapt a number of existing dependency treebanks to this emerging standard. The last release of the treebanks contain more than 60 different treebanks representing a diverse range languages such as English, German, Swedish, Spanish, Italian, Persian, Japanese, etc. The Norwegian Dependency Treebank (NDT) was recently converted to the UD scheme and included among the treebanks in the recent release.

This project will involve development of a Universal Dependencies treebank for Learner Norwegian from the ASK corpus. The ASK corpus (Norsk andrespråkskorpus) is a corpus of essays of Norwegian learners from ten different languages. The ASK corpus is made up of a collection of texts which are both original and error corrected.

The project will adapt the UD development methodology of the English Learner Corpus to Norwegian ASK and will investigate how to adapt Standard Norwegian UD labels to incorrect Norwegian constructions. The project will train, test, and evaluate the models on training and test data. The outcome of the project will consist of a Norwegian Learner treebank and an analysis of dealing with grammatical errors in UD.

 

The project expects some knowledge of dependency parsing. Any prior knowledge of Universal Dependencies formalism is an advantage.

Emneord: universal dependencies, dependency grammar, learner language, norwegian, dependency parsing, treebanks
Publisert 3. okt. 2017 14:00 - Sist endret 4. okt. 2017 11:14

Veileder(e)

Omfang (studiepoeng)

60