Multi-agent zero-shot coordination using transformer-based world models

세미나안내

Multi-agent zero-shot coordination using transformer-based world models

2023-09-07

1,257

김주연 교수(UNIST) / 2023.10.04

[Abstract]

How can we infuse artificial agents with cooperative behaviors? Previous approaches have relied on centralized training, i.e., co-situating agents throughout multiple episodes, or imitation learning, i.e., relying on human demonstrations. In this work, we demonstrate how the world models facilitate the emergence of cooperative policy with minimal real-world interactions and no human supervision. First, an agent learns to envision a world model based on its own egocentric observations and random actions of oneself and the partner agents. Then, the agent realizes cooperation inside the imagination by controlling not only oneself but also other agents. Throughout experiments, we evidence a seamless projection of the learned cooperative policy inside the world models to the real environment.

[Biography]

Jooyeon Kim is is an assistant professor at UNIST. His research in machine learning (ML) focuses on building and promoting interactive systems within which humans and machines (AI agents) collaborate, cooperate, and coordinate through verbal and non-verbal communicative signals. The multi-modal nature of such interactive systems leads his research trajectory to revolve around the research areas including: natural language processing (NLP), data mining, human-computer interaction (HCI), and optimization. He earned his Ph.D. and M.S. from the Korea Advanced Institute of Science and Technology (KAIST) and his B.E. from the University of Tokyo (東京大学). Since his graduate studies, he has been involved in a startup thingsflow, where he developed chatbot systems, virtual humans, image recognition, and largue language models (LLMs). In 2020-2021, he was a researcher at Microsoft Research Cambridge (MSRC). In 2021 — 2023, he was a postdoctoral researcher at RIKEN (이화학연구소: 理化学研究所). During graduate studies, he interned at Max Planck Institute for Software Systems (MPI-SWS) and Microsoft Research Cambridge (MSRC).