Bootstrapping semantic lexicons for low-resource languages
For many tasks, such as emotion detection or abstractness prediction, rich features coming from semantic lexicons are often more informative than trying to use deep learning to extract features from raw text. However, compiling these lexicons by hand requires large amounts of work, and most of the world's languages do not currently have access.
This thesis will contribute by exploring bootstrapping methods to automatically generate semantic lexicons for low-resource languages. Specifically, it would be of interest to compare hand-annotation, machine-translation, and cross-lingual embedding approaches. The precise details and scope of the thesis will be further decided in agreement between the supervisors and the candidate.
The project presupposes a good balance of technical and linguistic expertise. Good programming skills, experience with machine learning and a solid background in NLP are relevant qualifications. Knowledge of several foreign languages would also be an advantage. Please contact the supervisors to discuss further details.