New data set and paper on gender and sentiment

SANT has a new paper out, documenting gender effects in reviews.

In a new data set, where the 4,5K book reviews that form part of the Norwegian Review Corpus have been annotated with information about the gender of both the critic and author of the book being reviewed. Image may contain: Text, Font, Graphics.

The data set is described in a paper by S. Touileb, L. Øvrelid and E. Velldal titled Gender and sentiment, critics and authors: a dataset of Norwegian book reviews, accepted for the 2nd Workshop on Gender Bias in NLP at COLING 2020

The experiments described in the paper demonstrate that it is possible for machine learned classifiers to detect the gender of both authors and critics (!) with a considerable degree of accuracy, thereby also demonstrating that there are indeed gender-dependent differences in the language used by men and women and about men and women when judging someone's creative work.

Image may contain: Text, Symbol, Font, Electric blue, Graphics. By training and analysing gender-specific sentiment classifiers we also show differences in which evaluative terms are used for different gender constellations. Quantitative analysis of the data set itself also revealed interesting patterns, for example that female critics tend to be stricter in evaluating the works of female authors, relative to other gender-pairs in the data (e.g. men reviewing the works of women or other men).

The data set is freely available from GitHub:


Published Dec. 11, 2020 3:50 PM - Last modified Dec. 11, 2020 3:51 PM