Modern Natural Language Processing (NLP) relies on various sorts of language resources. This includes large corpora of texts, corpora annotated for various purposes, and hand-crafted resources, e.g., lexica or word-nets. Such resources are abundant for English. To build similar resources for Norwegian is vital for developing NLP for Norwegian to the same level as NLP for English.
Test suites, aslo known as challenge sets, are hand-crafted examples of phenomena used for evaluating NLP systems; “can this system handle that phenomenon?”. Test suites used to be more common some years ago, and have for a period been considered less useful than annotated corpora which may be used both for training and testing. Lately, test suites have been reintroduced as an evaluation tool as part of GLUE, a larger system containing several different ways of testing end-to-end systems for English. The GLUE Diagnostic Dataset contains 550 examples annotated with explanations.
Source: Wang et. al 2018
The first goal of the current project is to build a similar test suite for Norwegian.
The second part is more open. Here the goal is to use the test suite for evaluating various systems.
- GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.
- GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, ICLR 2019
- GLUE Diagnostic Dataset