While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL considers a family of machine learning (ML) methods that learn to reinforcement learn. That is, meta-RL methods use sample-inefficient ML to learn sample-efficient RL algorithms, or components thereof. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible.
In this monograph, the meta-RL problem setting is described in detail as well as its major variations. At a high level the book discusses how meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, the meta-RL algorithms and applications are surveyed. The monograph concludes by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.