RDMA Process using RoCE v2

Hi guys.
I want to get some information about the RoCE v2 in Gaudi-3.
On the white paper, Gaudi-3 has 24 NICs, and NIC provides the compute engine with RDMA featuring high bandwidth and low latency over reliable connection “without any software intervention.”

I’m confusing with “without any software intervention”. Does it mean that RDMA is processed only by Gaudi3 hardware?

RDMA process has initialization such as creating Queue Pairs, Registering memory regions And i think CPU involvement is essential for this process. But CPU involvement is not needed on data transfer.

How about for Gaudi-3? Is it same? I want to know about the RoCE v2 process in Gaudi-3.

The intention of what we say when we state:
"The NIC provides the compute engine with RDMA featuring high bandwidth and low latency over reliable connection without any software intervention”. is that there is no other driver or other SW config that the user needs to implement. It’s all being handled directly by Intel Gaudi HW and SW working together.