With 100GbE RoCE native integration on the Gaudi training processor, customers avoid performance and throughput bottlenecks inherent in off-chip platform implementation of RoCE that necessitate connectivity through a separate NIC with each processor.
In HLS-1, 7 of 10 ports are used for all-to-all connections within the server and the other 3 are used for scaling out of the server. Scale-out ports in one server can connect to scale-outs ports in another server, while GPU-based systems require separate NICs that go through PCIe. This could create additional performance bottlenecks.
See the Habana website for more info on our RoCE implementation