In this episode of Tech Threads: Weaving the Intelligent Future, Baya Systems’ Nandan Nayampally sits down with Charlie Cheng, founder and CEO of TC Lab, for an in-depth conversation on the memory wall and why it has become one of the defining bottlenecks in AI infrastructure. While memory constraints have existed for decades, AI inference is bringing the issue into sharper focus by turning memory bandwidth into a direct driver of user experience, system performance, and data center economics.
Charlie shares his perspective on the industry’s shift toward alternative AI architectures, from high-bandwidth memory and SRAM-based approaches to emerging 3D memory technologies and hybrid-bonded architectures that bring memory much closer to compute. He explains why inference workloads, especially token generation and KV cache access, can quickly become bandwidth-bound, and why solving that challenge requires rethinking the relationship between compute, memory, packaging, and on-chip data movement.
The discussion also explores what happens when memory bottlenecks are reduced or removed. As more bandwidth becomes available to AI accelerators, the pressure shifts to the rest of the system, including networks-on-chip, chiplet fabrics, and data movement architectures. For companies building next-generation AI chips, hyperscale infrastructure, autonomous systems, and edge inference platforms, this creates both a challenge and an opportunity: the need for more flexible, scalable, and software-defined approaches to moving data efficiently across increasingly complex systems.
Tune in for an expert look at why the future of AI performance depends as much on memory innovation and data movement as it does on compute, and how new architectures could help unlock faster, more efficient, and more scalable AI systems.
Cookies are required to play this video. Please enable cookies to watch it.