The newly created Norwegian Dependency Treebank (NDT) contains manually created morphosyntactic analyses of Norwegian Bokmål and Nynorsk sentences. It has recently been used to train syntactic parsers for Norwegian. A common problem with statistical parsers is that performance drops when applied to text from a different domain than that of the original training data. This problem is made even worse when parsers are applied to the noisy and fragmented text found on social micro-blogging platforms like Twitter, with heavy use of slang, emoticons, abbreviations, hash-tags, etc. This thesis seeks to adapt a Norwegian dependency parser for Twitter through pre-processing and text normalization, and/or by treebanking tweets for training data.
Dependency parsing of Norwegian tweets
Publisert 14. sep. 2014 22:23
- Sist endret 12. okt. 2016 15:21