A sentiment lexicon for Norwegian
The task of sentiment analysis (SA) concerns identifying subjective textual content with positive or negative orientation. Systems for performing SA (e.g., the Semantic Orientation CALculator) often include information from large sentiment lexicons that encode the strength and polarity of positive or negative words. Several such resources already exists for English, like SentiWordNet, WordNet-Affect, SenticNet, the various MPQA lexicons, and others. The aim of this project is to semi-automatically create a broad-coverage lexicon for sentiment analysis for Norwegian. Currently there is no such off-the-shelf resources available. The approach will start from a small number of seed words with known polarity. The seed words can possibly be based on the Norwegian WordNet (Norsk Ordvev), which already encodes positive/negative polarity for a few hundred word senses (so-called synsets). After an initial set of seed words have been encoded, a distributional and semi-supervised approach will be applied for automatically expanding the lexicon by classifying unknown words with respect to subjective polarity. The resulting lexicon(s) will be evaluated by classifying reviews in the newly released Norwegian Review Corpus (NoReC). This data set comprise over 35.000 reviews from different domains with ratings on a scale of 1–6.