I wrote a code to measure the FLOPS of TPC.
My code is as bellow (unnecessary parts are removed):
A_0 = torch.ones(num, dtype=dtype)
A_0_hpu = A_0.to("hpu")
A_1 = torch.ones(num, dtype=dtype)
A_1_hpu = A_1.to("hpu")
htcore.mark_step()
for j in range(n):
if j == 0:
B_hpu = torch.add(A_0_hpu, A_1_hpu, alpha=2)
else:
B_hpu = torch.add(B_hpu, A_0_hpu, alpha=2)
htcore.mark_step()
I profiled above code with flag HABANA_PROFILE=1
and could observe below error when n
is quite large (in my case, n
>= 60 for 16 MB tensors, A_0_hpu, A_1_hpu, B_hpu
). Can you figure out the reason of error?
Internal Error: Received signal - Segmentation fault
Segmentation fault (core dumped)
FYI, the reason I put two mark_step()
s is that I can get more clean “Analyzed Nodes” results (profiling results) after adding them. I can get only one node (e.g., “fusedTPCNode_0_0”) after adding them (I am not confident about this. It might be wrong description). Also, profiler seems to not capture the torch.add() if the resulting tensor is not used without mark_step()
.