Dialogue Modelling for Statistical Machine Translation
The project sets out to enhance the quality of machine translation technology in conversational domains (e.g. film subtitles) through a better account of the translation context.
About the project
The project investigates how to improve machine translation technology in dialogue domains. Machine translation, known by the general public through applications such as Google Translate, is the automatic translation from one language to another through a computer algorithm - for instance, translating from Japanese to Norwegian or vice-versa. Albeit great progress has been made over the last decade, machine translation technology remains often poor at adapting its translations to the relevant context. In order to translate a dialogue (say, film subtitles from English to Norwegian), current translation systems typically operate one utterance at a time and ignore the global coherence and structure of the conversation.
The project aims to make machine translation systems more "context-aware". The project will develop new translation methods that can dynamically adapt their outputs according to the surrounding dialogue context. More specifically, the project will demonstrate how to automatically extract contextual factors from dialogues and integrate these factors into a state-of-the-art statistical machine translation system. The main goal of the project is to show that this context-rich approach is able to produce translations of a higher quality than standard methods. In particular, the project will examine how these new translation methods can be practically employed to produce high-quality translations of film subtitles. Although the project will only conduct experiments with a limited set of languages (such as English <=> Norwegian), the translation techniques developed through the project are meant to be language-independent and could in principle be applied to any language pair. In the longer term, speech-to-speech interpretation (the task of automatically translating speech from one language to another, in real-time) is another possible application of the project.
The project is funded by a 3 years postdoctoral research grant from the Norwegian Research Council.
- The project has now officially started on the 1st of March 2014! You can read the full project description and planning in the project proposal.