Recent advances in denoising diffusion-based generative models

세미나안내

Recent advances in denoising diffusion-based generative models

2023-02-20

1,228

김세훈 박사(카카오브레인) / 2023.03.29

[Abstract]

This presentation will delve into denoising diffusion-based generative models, which are integral to various text-to-image generation models, such as DALL-E 2, Imagen, and Stable Diffusion. The presentation is structured into three parts. The first part will provide an overview of diffusion models and their associated network architectures, including papers like DDPM, Improved DDPM, DDIM, and DiT. The second part will discuss the extension of the diffusion framework to handle high-resolution images, covering the core concepts of Cascade Diffusion Models (CDM) and Latent Diffusion Models (LDM). Additionally, the presentation will explain how to integrate condition information to develop conditional generative models, utilizing techniques like cross-attention, modulation, and classifier-free guidance. Furthermore, the presentation will highlight recent architectures, ControlNet and Composer, that can take numerous conditions as inputs for diverse applications. Finally, the presentation will conclude by outlining our contributions to the research community in this field, such as the COYO and Karlo projects. COYO is a large-scale dataset with 747M image-text pairs and additional meta-attributes to enhance model training. Karlo is a cascade diffusion model with a mixture of experts that has been designed for efficiency. Its super-resolution module is trained using a hybrid objective, which combines likelihood-based and adversarial losses. This approach ensures that the model can produce high-quality outputs while maintaining a fast inference time.

[Biography]

Saehoon Kim is a research scientist at Kakao Brain, where he focuses on developing large-scale conditional image generation systems. He received his B.S and Ph.D. degrees from the Department of Computer Science and Engineering at POSTECH in 2009 and 2018, respectively. During his Ph.D., he studied binary embedding for approximate nearest neighbor search and completed research internships at MSRA and MSR, where he worked on projects related to large-scale near-duplicate image discovery and machine learning approaches to reduce tail latency in web search engines. After completing his Ph.D., Saehoon Kim worked as a senior research scientist at AITRICS, an AI healthcare startup. His research interests include generative models and self-supervised learning. He has received recognition for his work, including being awarded the best paper runner-up at WSDM’15 and serving as a senior program committee member for AAAI-2021 and IJCAI-2021.