Learning new control strategies for (possibly unknown) dynamical systems is a challenging task. Reinforcement learning algorithms typically require 'fresh' data regularly, but obtaining data safely and in sufficient quantities is a challenge on real systems. Thus, it is no surprise that most recent successes have been in domains where massive amounts of data can easily be generated in simulation (e.g., games such as Atari and Go).
In this talk, I will focus on an intermediate scenario, where an exact simulator is not available but we have access to one or more imperfect simulators. These can be used in different ways: we can train robust policies that work across different simulators, we might wish to detect where simulators do not match the reality, or we might want to learn modular policies that can easily be used in various conditions. I will discuss a recent and current work on learning such modular policies.
Biography: Herke van Hoof is assistant professor at the University of Amsterdam.
His research focuses on reinforcement learning, with a focus on developing techniques that could be applied on physical systems. Before that, he did research on robots learning from data they gather by themselves as a postdoc at McGill University in Montreal and as PhD student at TU Darmstadt. His obtained his bachelor and master degrees from the University of Groningen.