Visual Reference Resolution for Human-Robot Interaction
By its very definition, human-robot interaction (HRI) always takes place in a physically situated environment. To accomplish its tasks, the robotic agent must be able to understand verbal references to physical entities -- for instance, map the expression "the big mug in front of you" to an actual physical entity perceived by the robot.
This task is known as visual reference resolution and is usually a difficult task, due to the combination of several problems:
- Visual processing tasks such as object recognition and localization are particularly error-prone. The robot's awareness of the physical situation is therefore always going to be partial and uncertain. Probability distributions are typically used to capture this uncertain knowledge.
- Verbal references may rely on complex relations between objects. For instance, understanding the expression "the big mug in front of you" requires the robot to detect the mugs in the current scene, be aware of its own localisation, and understand what it means for an object X to be "in front of" some other entity Y.
- Finally, the environment of most HRI domains is not static. Objects and people may change location, new entities may appear or disappear from the scene, etc. Furthermore, the robot's actions can have real-world effects that modify the surrounding environment (e.g. if the robot graps an object on the table, the location of the object goes from being on the table to being in the hands of the robot).
The goal of this master thesis is to investigate how to address these challenges in a principled manner. In particular, the student will develop simple models for visual reference resolution using the formalism of probabilistic rules and the related OpenDial toolkit (http://www.opendial-toolkit.net). To evaluate the viability of the approach, the student will conduct experiments with the Nao robot at our disposal in our research group.
Requirements: interest for human-robot interaction and dialogue modelling, programming skills in Python and/or Java.