
WeaveIP fabric providing non-blocking switching between a large number of ports for the emerging scale-up and scale-out systems
The Challenge
AI, especially GenAI, is driving the need for much greater compute acceleration and data movement for both training and inference. Scale-up (vertical scaling) increases a server’s performance, while scale-out computing (horizontal scaling) increases the number servers to process workloads in parallel. While the industry has focused more on compute, the real challenge now is on the data movement bottleneck.
While scale-out has high growth requirements, scale-up has even more extreme performance demands with larger sets of tightly-coupled compute elements that are latency-sensitive. NVIDIA has turbocharged scale-up with NVLink™ interconnect and NVSwitch™ switching to drive rapid market growth. However, market needs are outstripping projected roadmaps. The arrival of new interconnect standards like Ultra Accelerator Link™ or UALink™ has drawn innovation to this space as well.

Metrics | Current | Future | |
---|---|---|---|
Scale-Out | Compute Nodes | Millions | 10x millions |
Node Bandwidth | 100 GB/s | Multi TB/s | |
Latency | 10 ms | 10-100 ms | |
Scale-Up | Compute Nodes | 10s of XPU | 1000s of XPUs |
Node Bandwidth | 100 GB/s | 10 TB/s | |
Latency | 100 ns | 100 ns |

Legacy crossbar-based methods are inadequate to provide this level of scaling for the upcoming generations.
The Solution
Baya Systems’ NeuraScale Scalable Switch Fabric is a WeaveIP™ advanced system IP solution that is designed from the ground-up to provide non-blocking switching between a large number of UALink™, Ultra Ethernet, or AMBA ports for the emerging scale-up and scale-out systems. ​
Extreme port density is achieved while maintaining near lowest theoretical latency, tight latency-bandwidth curve and simple physical design with design tiles. While the NeuraScale fabric is optimal for single SoCs, with the emerging 3D chiplet technology, the fabric’s unique chiplet–ready design, the resultant silicon footprint advantage and ease of physical implementation unlock a substantially greater scale than traditional crossbars.

The NeuraScale fabric is highly configurable and delivers the benefits of the non-blocking, crossbar without the downsides of extremely intensive implementation needed for the higher end port counts, along with the inherent limitations that accompany cross bar switches in terms of scaling to high port counts. ​
NeuraScale’s distributed approach does not compromise latency, in fact maintaining extremely low latency through wide buses and large port count. The flexibility offered in implementation allows easier access to all edges of the chiplet for I/O and PHYs to create larger scale through chiplets.​
Key Benefits
Low Latency and Peak Bandwidth
- Provides near-perfect, non-blocking KPIs with ultra-low latency
- Delivers crossbar performance without explosion in wires, gates and wiring congestion for larger port counts.
- Full throughout across random and organized traffic
Massive Scale
- Can support 256 ports per SoC or chiplet at 1 Tb/s
- Up to 256 chiplets each with 16 TB/s D2D bandwidth
- 32 TB/s switch port bandwidth per chiplet
Rapid Design, Integration, Faster Time to Market
- Software-driven configuration, analysis and design
- Simplified integration into switch and rest of SoC
- End-to-end readiness and post-silicon tuning
Compelling Flexibility
- Software-driven flexible design
- Topological, port count, feature set customization
- Transport architecture shared with other Baya fabrics
Ease of Implementation
- Chiplet-ready, fabric design, fully digital implementation
- Small footprint, allows much greater flexibility for large capacity switching
- Modularity allows for easy implementation, space for I/O
Advanced Features, Service Management
- Multicasting capability
- Configurable management of oversubscribed ports
- Reliability Availability and Serviceability (RAS)
Low Latency
and Peak Bandwidth
 Low Latency and Peak Bandwidth
- Provides near-perfect, non-blocking KPIs with ultra-low latency
- Delivers crossbar performance without explosion in wires, gates and wiring congestion for larger port counts.
- Full throughput across random and organized traffic
Rapid design, integration, faster time to market
 Rapid design, integration, faster time to market
- Software-driven configuration, analysis and design
- Simplified integration into switch and rest of SoC
- End-to-end readiness and post-silicon tuning
Rapid design, integration, faster time to market
 Rapid design, integration, faster time to market
- Software-driven configuration, analysis and design
- Simplified integration into switch and rest of SoC
- End-to-end readiness and post-silicon tuning
End-to-end control
