Recent years have seen an impressive, rapid progress in the ability for computers to perform tasks that have challenged AI researchers for decades. With deep learning, computers have mastered complex tasks from a wide range of domains, including image recognition, language modeling, game playing and predicting the future.
However, there are also many open challenges limiting the ability for deep learning algorithms to solve certain kinds of problems, and limiting our ability to understand and trust the decisions made by deep neural networks. One direction that can potentially address some of these open challenges is population-based search, including neuroevolution, where a population of neural networks solving a problem is maintained, allowing a more diverse search for candidate solutions.
Despite rapid and impressive progress in deep learning over the last few years, many open challenges remain for AI-systems to reach the flexible problem solving abilities displayed by humans and animals. One way to address some of these challenges is by taking inspiration from evolution, which has produced the most impressive problem-solving systems we are familiar with: The brains of humans and animals. These projects will build on recent advances in population based search and training of neural networks, investigating how these may help overcome open challenges in Artificial Intelligence research. There are several different themes which could be relevant, and the specific project should be worked out with the student and supervisor(s) according to interests. Here are some example themes:
- Solving problems with sparse rewards. Typical reinforcement learning algorithms struggle when problems have sparse rewards, that is, when many actions have to be taken to reach a goal, but where those actions do not have an associated reward. A population of agents can help, if those agents are encouraged to explore different types of solutions/behaviors. Encouraging exploration, or curiosity, can help reach solutions even without any associated reward, by rewarding the discovery of anything that is new. A powerful recent algorithm building on such ideas is Go-Explore, which became the first to solve the difficult Atari game Montezuma's Revenge. While being a very promising first demonstration of Go-Explore, many open questions remain, such as: What kinds of problems can Go-Explore solve? Can it be adapted to real-world tasks? How does its exploration strategy compare to more traditional curiosity-based reinforcement learning algorithms?
- Combining Neuroevolution and Deep Learning. Backpropagation-based Deep Learning (DL) and Neuroevolution (NE) have different strengths and weaknesses. DL is good at extracting structure from large amounts of data, forming meaningful, compressed internal representations from high-dimensional inputs. For instance, a deep neural network can learn from many pictures what features are characteristic for a cup, or for a dog. However, as noted above, DL is typically not good at solving problems with sparse rewards. It is also not good at exploring many different strategies for solving a problem simultaneously. NE has the potential to overcome these limitations. A promising way to combine NE and DL is therefore to let the deep learning do the ``heavy lifting'', for instance learning to make predictions or recognize objects based on a large number of examples, and train a small action-selection component using NE with the pre-trained deep neural network as a back-end. A few different papers have explored this recently, and there is much room for exploring creative ways to combine these two techniques.
- Coevolution of brains and bodies. Evolutionary algorithms are able to optimize both robot bodies and the brains (e.g. neural networks) controlling them. By optimizing brains and bodies together, efficient solutions may be found that actively use the robot's body to simplify the job of the brain. For instance, to cross difficult obstacles, a simple solution could be to just build a bigger robot, instead of developing sophisticated climbing abilities. An exciting possibility is to extend this coevolution by also evolving the robot's environment. Evolving environments and robots together can potentially help gradually improve the abilities of evolving robots by giving them harder and harder environmental challenges.
- Weight agnostic neural networks. Weight agnostic neural networks (WANNs) are neural networks that can solve a problem even if their weights are randomized. In other words, it is the architecture of these neural networks that solve a problem, rather than their specific weights. Weight agnostic neural networks are a recent discovery, and many open issues remain, such as: What types of tasks can be solved by WANNs? Can WANNs generalize better to new tasks due to relying less on specific connections? Can WANNs learn to encode many different tasks at the same time?
- Guiding neuroevolution with additional objectives. A powerful technique to improve the search for neural networks is to add extra objectives helping evolution towards promising areas of the search landscape. A common idea is to guide evolution by rewarding solutions that behave differently than others. An alternative idea, which we recently proposed and tested, is to guide evolution by rewarding solutions that are structurally different to others. An open question is which of these ideas works better for which kinds of problem. For instance, for problems with a very well defined structure, the latter may work better, while the former may work better for problems where there are deceptive traps.
- Open-ended evolution. Open-ended evolution builds on the idea of rather than searching for a single solution, searching for anything that is "interesting", potentially reaching complex solutions beyond what could be reached by objective-based search. A recent impressive demonstration of this is the POET algorithm, demonstrating that open-ended search can reach robot controllers that cannot be found by directly searching for them. With more and more Reinforcement Learning environments readily available online, there are now many exciting worlds to test open-ended evolution ideas within. One idea could be the Neural MMO world, where creatures compete in a simulated world for survival. But also other RL environments may be interesting to explore with open-ended search. This paper summarizes some of the most famous ones.
- Reward tampering. Reward tampering is the problem that occurs when an algorithm finds a loophole in how we specify a reward or fitness function and solves a different problem than we intended. DeepMind recently demonstrated a new, simple setup for studying reward tampering, and demonstrated that Reinforcement Learning algorithms easily fall into the trap of tampering with a reward function rather than actually solving the problem. Is this problem equally big for agents taught by evolution? Could techniques such as Novelty Search or Quality Diversity algorithms help us get an overview of potential ways reward tampering could happen and thereby to avoid it?