[특별세미나] A Validated Semantics for LLVM IR
Juneyoung Lee received his Ph.D. in Computer Science and Engineering from Seoul National University in 2021, and B.S. in Computer Science and Engineering from POSTECH in 2014. His research interests include defining the formal semantics of real-world programming languages and devising techniques for practical compiler verification. He is a recipient of the Distinguished Paper Award of PLDI’21.He worked as a research intern for Microsoft Research (Cambridge, UK) and an intern for Apple (Cupertino, US). He is currently working for CryptoLab as a research engineer to serve the mandatory military service. CryptoLab is a company developing a fast homomorphic encryption framework and post-quantum cryptography library.
He actively worked as a contributor for the LLVM project which is an open-source compiler infrastructure. Industrial compilers — Clang (Apple’s C/C++ compiler), NVIDIA’s CUDA compiler, Intel’s ICC, Swift, Rust — rely on LLVM to perform compiler optimizations and perform code generation for various targets. He was a keynote speaker of 2020 LLVM Developers’ Meeting, which is an official conference for the LLVM developers.
Also, he actively contributed to Alive2, an automatic validation framework for compiler ptimizations. Given a compiler transformation, Alive2 mathematically proves that it is correct or finds a counter-example. The Alive2 online is daily used in the code review process of the LLVM project.
Intermediate representation (IR) is a language that is used internally by a compiler to represent programs. Translation to an IR should preserve guarantees from the source language’s specification because they enable various optimizations. This naturally makes an IR a language that is rich with high-level information.
In LLVM, the semantics of important high-level features in IR was not rigorously defined. It caused compiler optimizations in LLVM to use different interpretations, and bad interactions between the optimizations resulted in miscompilation bugs that are hard to fix. To solve this problem, the IR’s semantics must be defined precisely. Then, optimizations that are incorrect with respect to the chosen semantics must be fixed. Both processes are challenging because LLVM is a large, fastly evolving software.
This thesis proposes (1) formal semantics of LLVM IR that resolves critical problems that we have found in the old IR semantics, making it consistent (2) a translation validation framework for LLVM’s optimizations to validate the new semantics. We show that the old semantics of undefined behavior and memory model in the IR cannot explain important optimizations in LLVM. We propose new semantics that solves this problem. Next, we present Alive2, a translation validation framework for LLVM based on the new semantics.Alive2 relies on an automatic theorem prover to validate optimizations without any hints from LLVM. It supports most of integer and float operations, memory operations, function calls, and branches. To make validation practical, resources used by the tool is bounded.
The new formal semantics of undefined behavior has been adopted by LLVM. The `freeze’ instruction that is proposed by us is officially added into LLVM 10.0, and the official document is updated to use our semantics. Also, critical problems in the old memory model we have found were shared with compiler developers, and patches have landed in LLVM to fix it.Alive2 has found more than 50 miscompilation bugs in LLVM so far and is used daily by LLVM developers.
-(오프라인) 공학2동 102호 4시~