Learning to Act by Predicting the Future
Deep Learning techniques have in recent years allowed the automated learning of prediction abilities, such as predicting future observations in traffic with a dashboard camera, or imagining future scenarios in video games. Typically, these algorithms work by seeing many examples of videos and learn to predict how new videos would continue. Unlike these systems, when we humans make and use predictions, we don't imagine a "video" of future images, but rather make high-level guesses or estimates about the state of our surroundings. For instance, I may predict that my coffee cup would break if I throw it in the floor, but I would not imagine a full "picture" of the broken cup, including predicting exactly how many pieces it breaks into.
This intuition is the idea behind a recent successful approach for learning to act by predicting the future. If we can learn high-level consequences of our actions (such as, if I throw a cup in the floor, it breaks), we can use those to guide our decisions (for instance, I choose not to throw this cup in the floor, as that typically makes it break). The approach was demonstrated in the computer game Doom, beating other reinforcement learning approaches in a Doom competition.
Many interesting open questions remain related to the idea of guiding actions with predictions about the future. Below follow a few suggested topics.
- Other problems. For what other kinds of problems could this technique be useful? For instance, could it be useful for robotic tasks (real or simulated)? Potentially, a robot could also benefit from letting predictions about outcomes guide its actions. Could it be useful for a self-driving car (relying on predicting things like distance to other cars, speed, etc)?
- Active learning. Can this technique be combined with active learning? Active learning is the idea of actively investigating objects or situations we are unsure of how to predict. For instance, if I feel confident a cup would break if thrown in the floor, but I'm less sure about a glass bottle, I should focus my investigations on the glass bottle, to improve my model of the world.
- Generalizability and robustness. How generalizable and robust is this technique? Current implementations rely on having exact measurements of the predicted quantities. Such measurements are simple in a computer game or simulation (e.g. a robot learning to predict its distance to humans with exact measurements from a simulator). However, do predictions learned in a simulator transfer well to reality? Could we use a similar technique to learn real-world predictions, by estimating real-world values of measurements?
- Prediction horizon. When guiding our actions with predictions about the future, we need to decide how far into the future to predict - and this will depend on the situation we are in. For instance, when riding my bicycle in traffic, predictions about the next few seconds are most important, whereas when deciding which apartment to buy, we make predictions about the next several years. Can we automatically learn the best prediction horizon for different kinds of decisions?
- Visualizing predictions. To better understand models that take decisions by predicting the future, a visualization or explanation of the predictions they make will be very valuable. An interesting task is therefore to explore good ways to visualize the kinds of predictions that are learned by AI-models that act by predicting the future.
- Adaptation. Does a predictive model of the world help agents adapt to new environments? In standard (model-free) reinforcement learning, adapting to new environments is challenging, partly because agents have learned how to act, but do not have an internal model of the world (which could also be relevant in the new, changed environment). Does such a predictive model help adapt to new environments?
- Automatically defined measurements. A limitation to the method described above is that the user needs to manually define which measurements we wish to learn to predict (e.g. for driving a car, the user may decide to predict the distance to other cars and the car's speed, as those are relevant to performance). The method would be more flexible if it could automatically discover which measurements to predict, by realizing that those measurements are the most relevant ones to evaluate success on the current task. We did some work in this direction, where an evolutionary algorithm (EA) optimized weightings between different manually selected measurements. There is much more potential here for finding more automatic ways: For instance, perhaps the EA could select from a large range of sensors which ones are most relevant for evaluating performance on a task?