세미나안내
Multimodal Generative and Interactive AI
- 등록일2026.04.22
- 조회수118
-

세미나 일정2026.04.29 WED
-

연사김태환 교수(UNIST)
[Abstract]
Multimodal learning aims to use and learn from multiple modalities of data such as vision, text and audio. In this talk, I will introduce my recent work on multimodal generative and interactive AI. In the first part of the talk, I will introduce one line of my work on developing multimodal generative AI for communication with human. As human, we use multimodalities to communicate with others as a single modality may not convey the all necessary information for effective social interaction. However, prior work has underexplored or lacked utilizing multimodalities for effective communication and also limited expressiveness. Therefore, we propose to utilize multimodal information to enhance more expressive and effective communication by defining novel tasks and either training a multimodal large language model or cross-modal transfer module from speech to visual. In the second part of the talk, I will focus on developing vision-language models for solving challenging real applications such as handwritten math expression evaluation, which requires both strong mathematical reasoning and visual understanding. We develop the first open model for this task and achieve state-of-the-art performance with two stage training with reinforcement learning and visual prompting. Also I will describe how we are able to improve the existing vision-language model on embodied agent to interact with the world given user instruction, by injecting the necessary skills to understand the environment.
[Biography]
Taehwan Kim is currently an associate professor in Graduate School of Artificial Intelligence and Department of Computer Science and Engineering at Ulsan National Institute of Science and Technology (UNIST). Previously, he was an applied scientist at Amazon Alexa AI and a lead research scientist at a start-up company, ObEN. Before then, he was a postdoctoral scholar in the Computing and Mathematical Sciences department at the California Institute of Technology working with Prof. Yisong Yue. He completed his PhD in 2016 at Toyota Technological Institute at Chicago, a philanthropically endowed academic computer science institute located on the University of Chicago campus, and his advisor was Prof. Karen Livescu. He did his master in Computer Science at USC and bachelor in Computer Science & Engineering and Mathematics at POSTECH.
His main research interests span various problems related to the fields of Machine Learning and applications to Computer Vision and Language Processing. Specifically, he is interested in Multimodal Learning, Generative Models, and Interactive AI.
- 첨부파일
- 세미나포스터_0429김태환.jpg



