Episode 11: Breaking the Memory Wall: How New Memory Architectures are Reshaping AI Inference

About Episode 11

In this episode of Tech Threads:

Weaving the Intelligent Future, Baya Systems’ Nandan Nayampally sits down with Charlie Cheng, founder and CEO of TC Lab, for an in-depth conversation on the memory wall and why it has become one of the defining bottlenecks in AI infrastructure. While memory constraints have existed for decades, AI inference is bringing the issue into sharper focus by turning memory bandwidth into a direct driver of user experience, system performance, and data center economics.

Charlie shares his perspective on the industry’s shift toward alternative AI architectures, from high-bandwidth memory and SRAM-based approaches to emerging 3D memory technologies and hybrid-bonded architectures that bring memory much closer to compute. He explains why inference workloads, especially token generation and KV cache access, can quickly become bandwidth-bound, and why solving that challenge requires rethinking the relationship between compute, memory, packaging, and on-chip data movement.

The discussion also explores what happens when memory bottlenecks are reduced or removed. As more bandwidth becomes available to AI accelerators, the pressure shifts to the rest of the system, including networks-on-chip, chiplet fabrics, and data movement architectures. For companies building next-generation AI chips, hyperscale infrastructure, autonomous systems, and edge inference platforms, this creates both a challenge and an opportunity: the need for more flexible, scalable, and software-defined approaches to moving data efficiently across increasingly complex systems.

Tune in for an expert look at why the future of AI performance depends as much on memory innovation and data movement as it does on compute, and how new architectures could help unlock faster, more efficient, and more scalable AI systems.

April 28, 2026

Episode 10: Beyond the CPU vs GPU War: Rethinking AI Compute at the System Level

March 27, 2026

Episode 9: Inside the AI Bottleneck: Data Movement, Chiplets, and System Scaling

January 20, 2026

Episode 8: From Arduino to AI Infrastructure: Scaling the Next Wave of Computing

TN-Ep8-From-Arduino-to-AI-Infrastructure-Scaling-the-Next-Wave-of-Computing

October 29, 2025

Redefining AI Infrastructure: A Conversation with Cambrian-AI and Tirias Research

TN-Bonuspodcast-Redefining AI Infrastructure

October 14, 2025

Breaking the Memory Wall:

How New Memory Architectures are Reshaping AI Inference

About Episode 11

Other Podcasts

Episode 10: Beyond the CPU vs GPU War: Rethinking AI Compute at the System Level

Episode 9: Inside the AI Bottleneck: Data Movement, Chiplets, and System Scaling

Episode 8: From Arduino to AI Infrastructure: Scaling the Next Wave of Computing

Redefining AI Infrastructure: A Conversation with Cambrian-AI and Tirias Research

Episode 7: The Architecture of “Open” Intelligence

Subscribe