Sentiment Analysis for Norwegian Text
The SANT project aims to create training data and machine-learned models for Sentiment Analysis for Norwegian Text. While coordinated by the Language Technology Group at IFI/UiO, collaborating partners include NRK, Schibsted and Aller Media.
Sentiment Analysis (SA)
One of the applications of Language Technology (LT) that has gained most widespread use in recent years is so-called opinion mining or sentiment analysis (SA). In broad terms, SA is the task of automatically identifying the opinions, attitudes or emotions that are expressed by subjective information in text.
The goal of SANT is to create open resources for sentiment analysis for Norwegian. The project represents a collaboration between the Language Technology Group (LTG) at the Department of Informatics at the University of Oslo, and three of Norway's largest media groups; the public broadcaster NRK/P3 and the privately held Schibsted Media Group and Aller Media. The media partners provide data in the form of reviews, collected across a range of different domains; music, literature, restaurants, home electronics, and more. As reviews by definition are packed with subjective opinions and evaluations, they're ideally suited for sentiment analysis.
Document-level SA: Reviews as training data
The SANT project takes advantage of a peculiarity of the way reviews and critiques are typically summarized in Norwegian arts journalism and consumer journalism, viz. by an explicit rating on a scale 1–6, represented as a throw of a die. We here propose to use this feature for semi-automatically compiling a polarity labeled text collection. We can then use this to train and evaluate machine-learned models for sentiment analysis on the document-level.
For some applications, however, it is desirable to have models that can make more granular predictions at the sentence-level, and additionally identify the targets and holders of the opinions. To enable such models, a subset of the review corpus will therefore be manually annotated with fine-grained in-sentence polarity information.
In the field of AI in general, and LT in particular, the use of many-layered artificial neural networks (so-called Deep Learning) has recently seen as great revival with many successful applications, including sentiment analysis. The classifiers developed in this project will seek to push the state-of-the-art in large-scale sentiment analysis using deep neural architectures.
The project is granted funding from the RCN's IKTPLUSS initiative until fall 2022.