WeaveIP fabric providing non-blocking switching between a large number of ports for the emerging scale-up and scale-out systems

The Challenge

AI, especially GenAI, is driving the need for much greater compute  acceleration and data movement for both training and inference. Scale-up (vertical scaling) increases a server’s performance, while scale-out computing (horizontal scaling) increases the number servers to process workloads in parallel. While the industry has focused more on compute, the real challenge now is on the data movement bottleneck.

While scale-out has high growth requirements, scale-up has even more extreme performance demands with larger sets of tightly-coupled compute elements that are latency-sensitive. NVIDIA has turbocharged scale-up with  NVLink™ interconnect and NVSwitch™ switching to drive rapid market growth. However, market needs are outstripping projected roadmaps.  The arrival of new interconnect standards like Ultra Accelerator Link™  or UALink™ has drawn innovation to this space as well.

MetricsCurrentFuture
Scale-OutCompute NodesMillions10x millions
Node Bandwidth100 GB/sMulti TB/s
Latency10 ms10-100 ms
Scale-UpCompute Nodes10s of XPU1000s of XPUs
Node Bandwidth100 GB/s10 TB/s
Latency100 ns100 ns

Legacy crossbar-based methods are inadequate to provide this level of scaling for the upcoming generations.

The Solution

Baya Systems’ NeuraScale Scalable Switch Fabric is a WeaveIP™ advanced system IP solution that is designed from the ground-up to provide non-blocking switching between a large number of UALink™, Ultra Ethernet, or AMBA ports for the emerging scale-up and scale-out systems. ​

Extreme port density is achieved while maintaining near lowest theoretical latency, tight latency-bandwidth curve and simple physical design with design tiles. While the NeuraScale fabric is optimal for single SoCs, with the emerging 3D chiplet technology, the fabric’s unique chipletready design, the resultant silicon footprint advantage and ease of physical implementation unlock a substantially greater scale than traditional crossbars.

The NeuraScale fabric is highly configurable and delivers the benefits of the non-blocking, crossbar without the downsides of extremely intensive implementation needed for the higher end port counts, along with the inherent limitations that accompany cross bar switches in terms of scaling to high port counts. ​

NeuraScale’s distributed approach does not compromise latency, in fact maintaining extremely low latency through wide buses and large port count. The flexibility offered in implementation allows easier access to all edges of the chiplet for I/O and PHYs to create larger scale through chiplets.​

Key Benefits

Low Latency and Peak Bandwidth
  • Provides near-perfect, non-blocking KPIs with ultra-low latency
  • Delivers crossbar performance without explosion in wires, gates and wiring congestion for larger port counts.
  • Full throughout across random and organized traffic
  • Can support 256 ports per SoC or chiplet at 1 Tb/s
  • Up to 256 chiplets each with 16 TB/s D2D bandwidth
  • 32 TB/s switch port bandwidth per chiplet
  • Software-driven configuration, analysis and design
  • Simplified integration into switch and rest of SoC
  • End-to-end readiness and post-silicon tuning
  • Software-driven flexible design
  • Topological, port count, feature set customization
  • Transport architecture shared with other Baya fabrics
  • Chiplet-ready, fabric design, fully digital implementation
  • Small footprint, allows much greater flexibility for large capacity switching
  • Modularity allows for easy implementation, space for I/O
  • Multicasting capability
  • Configurable management of oversubscribed ports
  • Reliability Availability and Serviceability (RAS)

Low Latency
and Peak Bandwidth

 Low Latency and Peak Bandwidth

  • Provides near-perfect, non-blocking KPIs with ultra-low latency
  • Delivers crossbar performance without explosion in wires, gates and wiring congestion for larger port counts.
  • Full throughput across random and organized traffic

Rapid design, integration, faster time to market

 Rapid design, integration, faster time to market

  • Software-driven configuration, analysis and design
  • Simplified integration into switch and rest of SoC
  • End-to-end readiness and post-silicon tuning

Rapid design, integration, faster time to market

 Rapid design, integration, faster time to market

  • Software-driven configuration, analysis and design
  • Simplified integration into switch and rest of SoC
  • End-to-end readiness and post-silicon tuning

End-to-end control

Like other WeaveIP™ fabrics,  the NeuraScale fabric can be analyzed, configured, designed and implemented using the WeaverPro™ FabricStudio™ software platform. This provides the user with very granular control from concept through implementation.  It includes post-silicon tuning with a wide range of programmability.
Scroll to Top