Oppgaven er ikke lenger tilgjengelig

Dependency parsing of Norwegian tweets

The newly created Norwegian Dependency Treebank (NDT) contains manually created morphosyntactic analyses of Norwegian Bokmål and Nynorsk sentences. It has recently been used to train syntactic parsers for Norwegian. A common problem with statistical parsers is that performance drops when applied to text from a different domain than that of the original training data. This problem is made even worse when parsers are applied to the noisy and fragmented text found on social micro-blogging platforms like Twitter, with heavy use of slang, emoticons, abbreviations, hash-tags, etc. This thesis seeks to adapt a Norwegian dependency parser for Twitter through pre-processing and text normalization, and/or by treebanking tweets for training data.

Publisert 14. sep. 2014 22:23 - Sist endret 12. okt. 2016 15:21

Veileder(e)

Lilja Øvrelid Universitetet i Oslo
Erik Velldal Universitetet i Oslo

Dependency parsing of Norwegian tweets

Veileder(e)

Omfang (studiepoeng)