Til bevillingsoversigt

Imaginary World Models Based on Neural Stochastic Processes to Overcome Reinforcement Learning Barriers

Semper Ardens: Accelerate


Intelligent autonomous agents that interact with their environment, such as drones, self-driving cars, and robot arms, are shaping our technological future. The state-of-the-art artificial intelligence framework for training autonomous agents on tasks, so-called reinforcement learning, can outperform human performance in board games. But it is a long-lasting wonder for the machine learning community why the same reinforcement learning framework cannot find robust solutions to even modest optimal control problems, such as balancing a double pendulum, when it is not affordable to make excessively many failed trials. The goal of my project is to identify the reasons behind the high data hunger of the reinforcement learning setup and develop algorithms that can overcome it.


Reinforcement learning can deliver successful control policies, functions that decide which action an autonomous agent will take at a given situation, only after the agent interacts with its environment many times. Learning from exhaustive trial and error is an acceptable assumption for synthetic environments such as game playing, the prime use case of reinforcement learning thus far. The same assumption also sets a hard bottleneck on the applicability of reinforcement learning algorithms to many real-world use cases where trials are expensive and errors are dangerous, such as robotics, autonomous driving, and human-machine interaction. Understanding the reasons behind the data hunger of reinforcement learning would provide new machine learning tools helpful to overcome this bottleneck.


The biggest barrier against data-efficient reinforcement learning is the lack of an effective mechanism for the autonomous agent to build a mental map of its environment. Such a map would then allow the agent to simulate the possible futures and rule out the actions that may lead to task failure. The recently invented neural stochastic processes can encode the behaviour of complex and random dynamical environments with unprecedented precision by incorporating neural networks into stochastic differential equation systems. I will develop reinforcement learning algorithms powered by neural stochastic processes that enable autonomous agents to learn accurate mental maps from significantly less interactions with the environment than the present methods provide.


Improving the interaction efficiency of reinforcement learning algorithms has the potential to significantly expand the tasks autonomous agents can perform. It may lay the fundament for the affordable development of drones that can deliver parcels, self-driving cars that can navigate safely in crowded urban areas, and robots that outperform humans at high-precision surgery with much less cost and risk than today. Increased automation would bring more efficient use of resources and less human involvement in dangerous tasks, contributing significantly to the goal of building a sustainable and environment-friendly economy that improves the well-being of our society.