Dataset for sentence-level polarity
We have just released a new dataset for modeling sentence-level polarity for Norwegian: NoReCsentence
The previously released NoReCfine dataset annotates fine-grained sentiment information for Norwegian, including target expressions, polar expressions, intensity and holders. For some applications, however, it may be more convenient to predict sentence-level polarity instead. While a greatly simplified task, sentence-level polarity prediction has been widely used within NLP for quick model benchmarking in particular, for example based on well-known English datasets like SST.
In the newly released dataset NoReCsentence, we have aggregated the fine-grained annotations to the sentence-level in two ways:
- Binary: includes the subset of sentences that contained sentiment annotations of either positive or negative polarity (but not both).
- Three-way: additionaly includes sentences annotated as having no sentiment at all (neutral).
Note that for both binary and three-way, sentences that contained mixed polarity are excluded.