Oppgaven er ikke lenger tilgjengelig

Grammatical error correction for Norwegian learner texts

Bildet kan inneholde: kontorrekvisita, skrive redskap, kontorinstrument, tilbehør til skriveinstrument, gjøre.

Grammatical error correction (GEC) is the task of correcting grammatical errors automatically in texts written by non-native speakers. GEC is a well-studied problem for English, where shared tasks inviting NLP systems to compete based on the same training data and evaluation metrics have been proposed. However, this NLP task remains much less explored for a number of other languages including Norwegian.

The aim of this project is to explore grammatical error detection and correction for Norwegian using traditional machine learning and/or deep learning methods. A possible source material to use is the ASK corpus, a dataset containing essays written by non-native speakers of Norwegian. The corpus contains annotated and corrected learner errors of different types, together with metadata information about learners including their native language and proficiency level. The use of additional corpora could also be investigated within the project.

Automatic error correction systems help non-native speakers with text revisions and can lead to improved writing skills. Normalizing learner errors is also a useful pre-processing step to improve the performance of NLP tools trained on standard corpora when processing more error-prone texts. Moreover, since learner errors are indicative of one’s native language(s), GEC can also be seen as a means of protecting this demographic information about the writer in the context of text anonymization.

The practical thesis supervision will be provided by Pierre Lison during the first semester (spring 2022), and by Ildikó Pilán from September 2022 up to the submission deadline in 2023. 

Prerequisites

Good programming skills in Python, interest for topics related to computer-assisted language learning. The student taking this topic must be enrolled in the M.Sc. in Informatics: Language Technology.

References

Christopher Bryant, Mariano Felice, Øistein E. Andersen and Ted Briscoe. 2019. The BEA-2019 Shared Task on Grammatical Error Correction. In Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications (BEA-2019), pp. 52–75, Florence, Italy. Association for Computational Linguistics.

Kari Tenfjord, Paul Meurer and Knut Hofland. 2006. "The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language." LREC. Vol. 6, p. 1821-1824.

Emneord: NLP, deep learning, learner language, error correction
Publisert 13. okt. 2021 14:52 - Sist endret 7. des. 2022 14:54

Veileder(e)

Omfang (studiepoeng)

60