Optimizing memory traffic processing in modern SoC
Modern SoC consists of many IPs from various domains. Those IPs include CPU, GPU and various domain-specific accelerators such as image processing and neural network processing. While transistor scaling provides more functions and higher performance with parallelism, memory resource is still shared among IPs; in short, its bandwidth is limited. It incurs memory bandwidth wall in addition to well-known memory latency wall. This talk examines how to achieve the best memory bandwidth and latency for various IPs. First, we discuss memory access requirements from IPs. Each IP has its unique characteristics called traffic patterns. To support IP’s computation capability, memory subsystem designers should understand IP’s traffic patterns. Second, we consider ways to achieve the best memory bandwidth and latency. Multiple IPs’ mixed traffic patterns are hard to process because their distinct traffic patterns are blended. We discuss how to preserve those distinct properties at the end of the memory subsystem and how to achieve better bandwidth while keeping latency requirements in mind. Various techniques discussed in computer architecture are applied, such as Network-on-Chip, Cache, TLB, and DRAM architecture.
Dr. Wooil Kim is a principal engineer at Samsung Electronics. His research focuses on how to provide the maximum bandwidth and the best latency for SOC IPs, including chip-to-chip interconnects such as CXL and UCIe. He is responsible for providing a wide range of System IPs for Exynos SoC series with 60+ engineers’ team. He has a PhD from University of Illinois at Urbana-Champaign in Computer Scicence, and a Master/Bachelor of Science from Seoul National University in Computer Science and Engineering.