Benchmarks for unsupervised reinforcement learning and preference-based reinforcement learning

세미나안내

Benchmarks for unsupervised reinforcement learning and preference-based reinforcement learning

2022-02-08

1,960

Kimin Lee, Researcher at Google Brain / 2022.02.09

Title: Benchmarks for unsupervised reinforcement learning and preference-based reinforcement learning

Speaker: Kimin Lee, Researcher at Google Brain

Date: Feb. 9, Wednesday, 1PM

Abstract: Deep Reinforcement Learning (RL) has been successful in a range of challenging domains, such as board games, video games, and robotic control tasks. Scaling RL to many applications, however, is yet precluded by a number of challenges. One such challenge lies in designing a suitable reward function that is sufficiently informative yet easy enough to provide. Preference-based RL methods allow practitioners to instead interactively teach agents through tailored feedback; however, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark. In this talk, I will introduce B-Pref: a benchmark specially designed for preference-based RL. A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly, which makes relying on real human input for evaluation prohibitive. At the same time, simulating human input as giving perfect preferences for the ground truth reward function is unrealistic. B-Pref alleviates this by simulating teachers with a wide array of irrationalities, and proposes metrics not solely for performance but also for robustness to these potential irrationalities.

Another outstanding challenge is training generalist agents that can quickly adapt to new tasks. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, I will introduce the Unsupervised Reinforcement Learning Benchmark (URLB). In this benchmark, I found that implemented baselines make progress but are not able to solve URLB and propose directions for future research.

Bio: Kimin Lee is a research scientist in the Google Brain team. He is interested in the directions that enable scaling deep reinforcement learning to diverse and challenging domains — human-in-the-loop reinforcement learning, unsupervised reinforcement learning, and self-supervised learning. He completed his postdoctoral training at UC Berkeley (advised by Prof. Pieter Abbeel) and he received his Ph.D. from KAIST (advised by Prof. Jinwoo Shin). During Ph.D., he also interned and collaborated closely with Honglak Lee at University of Michigan

Zoom link: https://zoom.us/j/6906724188?pwd=QVI3Tkp0M3FnWHhodXhKY2NZSUx5Zz09
Workshop website: https://sites.google.com/view/pair-ml-winter-seminar-2022/home