The previously released NoReCfine dataset annotates fine-grained sentiment information for Norwegian, including target expressions, polar expressions, intensity and holders. For some applications, however, it may be more convenient to predict sentence-level polarity instead. While a greatly simplified task, sentence-level polarity prediction has been widely used within NLP for quick model benchmarking in particular, for example based on well-known English datasets like SST.
In the newly released dataset NoReCsentence, we have aggregated the fine-grained annotations to the sentence-level in two ways:
- Binary: includes the subset of sentences that contained sentiment annotations of either positive or negative polarity (but not both).
- Three-way: additionaly includes sentences annotated as having no sentiment at all (neutral).
Note that for both binary and three-way, sentences that contained mixed polarity are excluded.
The dataset has already been used for benchmarking large-scale contextualized language models for Norwegian like NorBERT and NoTraM, as documented in forthcoming NoDaLiDa publications.