Gender and sentiment in book reviews
Incorporating demographic metadata in text classifiers trained on top of pre-trained transformer language models can help with achieving higher accuracy and/or reveal bias inherent in the data. In previous work we have investigated this with respect to gender, on the basis of NoReCgender – an existing data set of Norwegian book reviews with ratings by professional critics, also containing information about the gender of both critics and authors.
Training a document-level sentiment classifier on top of a Norwegian BERT-model (NorBert), we showed that gender-informed models obtain substantially higher accuracy, and that polarity-informed models obtain higher accuracy when classifying the genders of book authors.
However, we believe there is still important work that remains to be done in this direction. To shed some more light on these issues, this Master’s project will explore research questions like (but not restricted to) the following;
- Can a model still predict the gender of authors if we first “gender neutralize” the texts (e.g. by masking given-names and gendered pronouns)? How could such neutralizing pre-processing best be carried out?
- What is the effect of supplying knowledge of the gender of the author as a variable when attempting to predict the gender of the critic, and vice versa.
- What is the effect of supplying knowledge of gender when predicting more fine-grained polarity values (e.g., using the full scale of 1–6, rather than just binary positive/negative).
- Is it possible to use methodology from XAI (eXplainable Artificial Intelligence) to shed more light on what information is used by the models when predicting gender and/or polarity?
Supervision will be conducted as a collaboration between IFI/LTG and the MediaFutures Research Centre for Responsible Media Technology & Innovation, with supervisors at both affiliations. The precise details and scope of the thesis can be further decided in agreement between the supervisors and the candidate.
The project presupposes a good balance of technical and linguistic expertise. Good programming skills, experience with machine learning and a solid background in NLP are relevant qualifications. Please contact the supervisors to discuss further details.