Unlock the potential of Generative AI by Model Compression

  • 420

With the emergence of ChatGPT, there is a growing interest in large-scale generative AI models. Deep learning models have been steadily growing since 2012, and now models with over 1 billion parameters are commonly found. The increase in the size of AI models implies the need for more hardware resources to run these models, leading to issues such as the cost of AI-based services and limitations in usage environments. AI model compression is a technique to address these problems by compressing the size of models or making them run faster while maintaining their performance. This enables running AI services at a lower cost or achieving faster inference speeds. AI compression techniques include Quantization, Pruning, Knowledge distillation, and others. In this talk, several practical approaches to compress large-scale generative AI models such as Stable Diffusion and LLMs will be discussed.

Hyungjun Kim received his bachelor’s and PhD degrees from Pohang University of Science and Technology (POSTECH). He worked at the Holst Centre in Netherlands as a research intern from January to Sep, 2015 and also spent the summer of 2018 at IBM T.J. Watson Research Center. After receiving the PhD degree, he was employed as a researcher of POSTECH Future IT Innovation Laboratory from 2021 to 2022. His research for last 10 years includes hardware-algorithm co-design for efficient deep learning system. Based on his research achievements, he founded SqueezeBits Inc., a startup building efficient AI models and systems, and currently serves as the CEO. He was recently selected for the Forbes Korea 30 Under 30 in the Deep/Enterprise Tech category.