Work Flow of RoCE v2 in Gaudi-3

Hi
I want to make it clearly about the RoCE-v2 in Gaudi-3.
On white paper of Gaudi-3, Gaudi-3 is scaled up with 21x200GbE RoCE and scaled out with 3x200GbE RoCE-v2. And It says that “The NIC provides the compute engine with RDMA featuring high bandwidth and low latency over reliable connection without any software intervention”.

I’m confused with “without any software intervention” sentence.
On the view of scale-up, 8x Gaudi-3 is connected with 200GbE RoCE v2. And i want to know there is no host CPU intervention for RDMA between Gaudi-3 OAMs.
For RDMA, there is a initialization stage for allocate QPs and other resources and Host CPU do this. Is there any difference between conventional RDMA and Gaudi-3’s RDMA?

Thanks.

The intention of what we say when we state:
"The NIC provides the compute engine with RDMA featuring high bandwidth and low latency over reliable connection without any software intervention”. is that there is no other driver or other SW config that the user needs to implement. It’s all being handled directly by Intel Gaudi HW and SW working together.