NeuraScale™: Advanced Non-Blocking Switch Fabric Solutions

WeaveIP fabric providing non-blocking switching between a large number of ports for the emerging scale-up and scale-out systems

The Challenge

AI, especially GenAI, is driving the need for much greater compute acceleration and data movement for both training and inference. Scale-up (vertical scaling) increases a server’s performance, while scale-out computing (horizontal scaling) increases the number servers to process workloads in parallel. While the industry has focused more on compute, the real challenge now is on the data movement bottleneck.

While scale-out has high growth requirements, scale-up has even more extreme performance demands with larger sets of tightly-coupled compute elements that are latency-sensitive. NVIDIA has turbocharged scale-up with NVLink™ interconnect and NVSwitch™ switching to drive rapid market growth. However, market needs are outstripping projected roadmaps. The arrival of new interconnect standards like Ultra Accelerator Link™ or UALink™ has drawn innovation to this space as well.

Legacy crossbar-based methods are inadequate to provide this level of scaling for the upcoming generations.

The Solution

Baya Systems’ NeuraScale Scalable Switch Fabric is a WeaveIP™ advanced system IP solution that is designed from the ground-up to provide non-blocking switching between a large number of UALink™, Ultra Ethernet, or AMBA ports for the emerging scale-up and scale-out systems.

Extreme port density is achieved while maintaining near lowest theoretical latency, tight latency-bandwidth curve and simple physical design with design tiles. While the NeuraScale fabric is optimal for single SoCs, with the emerging 3D chiplet technology, the fabric’s unique chiplet–ready design, the resultant silicon footprint advantage and ease of physical implementation unlock a substantially greater scale than traditional crossbars.

The NeuraScale fabric is highly configurable and delivers the benefits of the non-blocking, crossbar without the downsides of extremely intensive implementation needed for the higher end port counts, along with the inherent limitations that accompany cross bar switches in terms of scaling to high port counts.

NeuraScale’s distributed approach does not compromise latency, in fact maintaining extremely low latency through wide buses and large port count. The flexibility offered in implementation allows easier access to all edges of the chiplet for I/O and PHYs to create larger scale through chiplets.

Key Benefits

Low Latency and Peak Bandwidth

Provides near-perfect, non-blocking KPIs with ultra-low latency
Delivers crossbar performance without explosion in wires, gates and wiring congestion for larger port counts.
Full throughout across random and organized traffic

Massive Scale

Can support 256 ports per SoC or chiplet at 1 Tb/s
Up to 256 chiplets each with 16 TB/s D2D bandwidth
32 TB/s switch port bandwidth per chiplet

Rapid Design, Integration, Faster Time to Market

Software-driven configuration, analysis and design
Simplified integration into switch and rest of SoC
End-to-end readiness and post-silicon tuning

Compelling Flexibility

Software-driven flexible design
Topological, port count, feature set customization
Transport architecture shared with other Baya fabrics

Ease of Implementation

Chiplet-ready, fabric design, fully digital implementation
Small footprint, allows much greater flexibility for large capacity switching
Modularity allows for easy implementation, space for I/O

Advanced Features, Service Management

Multicasting capability
Configurable management of oversubscribed ports
Reliability Availability and Serviceability (RAS)

End-to-end control

Like other WeaveIP™ fabrics, the NeuraScale fabric can be analyzed, configured, designed and implemented using the WeaverPro™ FabricStudio™ software platform. This provides the user with very granular control from concept through implementation. It includes post-silicon tuning with a wide range of programmability.