NoReC release

We're happy to announce the public release of the Norwegian Review Corpus (NoReC), a dataset created for training and evaluating models for document-level sentiment analysis.

This first release of the corpus comprises 35,194 full-text reviews extracted from eight different major Norwegian news sources: Dagbladet, VG, Aftenposten, Bergens Tidende, Fædrelandsvennen, Stavanger Aftenblad, DinSide.no and P3.no. The reviews cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author. For more information and access to the data, please see the following git repository: 

https://github.com/ltgoslo/norec

The corpus is also described in more detail in the following arXiv preprint:

NoReC: The Norwegian Review Corpus
Erik Velldal, Lilja Øvrelid, Eivind Alexander Bergem, Cathrine Stadsnes, Samia Touileb, and Fredrik Jørgensen
2017
https://arxiv.org/abs/1710.05370

By erikve
Published Oct. 23, 2017 10:28 AM - Last modified Dec. 5, 2017 10:25 AM