Improving a rule-based tagger using neural methods

Bildet kan inneholde: gjøre.In this project you will compare a hybrid and a fully neural approach to morphosyntactic analysis of Norwegian. The project will be jointly supervised by LTG and the Text Laboratory at the Department of Linguistics and Scandinavian Studies.

Bildet kan inneholde: linje, rektangel, gjøre, elektrisk blå, symbol.The starting point of this project is the existing hybrid tagger called the Oslo-Bergen tagger (OBT), a tool for morphosyntactic analysis and lemmatization of Norwegian (bokmål and nynorsk). It consists of three components:

  1. a multitagger giving all possible analyses of the input form, based on the full-form lexicon "Norsk ordbank" and a module for compound analysis,
  2. constraint grammar rules that disambiguate the analysis fully or partially based on the contexts and rule out impossible analyses, and
  3. a statistical component that applies in cases that were not fully disambiguated by the constraint grammar rules. 

The statistical component is getting old and can likely be improved with neural techniques. This is the first subgoal of the project. A second subgoal is to systematically compare the performance of the OBT system with fully neural taggers such as Stanza trained on the Norwegian Universal Dependencies Treebank. Bildet kan inneholde: gjøre, symbol, kryss, merke.

 

Emneord: Natural Language Processing, NLP, tagging, neural methods, rule-based methods, hybrid methods, språkteknologi
Publisert 12. okt. 2021 11:42 - Sist endret 12. okt. 2021 11:49

Omfang (studiepoeng)

60