Reason for segmentation fault

juntaek0425 · September 5, 2024, 9:34pm

I wrote a code to measure the FLOPS of TPC.

My code is as bellow (unnecessary parts are removed):

A_0 = torch.ones(num, dtype=dtype)
A_0_hpu = A_0.to("hpu")

A_1 = torch.ones(num, dtype=dtype)
A_1_hpu = A_1.to("hpu")

htcore.mark_step()
for j in range(n):
    if j == 0:
        B_hpu = torch.add(A_0_hpu, A_1_hpu, alpha=2)
    else:
        B_hpu = torch.add(B_hpu, A_0_hpu, alpha=2)
htcore.mark_step()

I profiled above code with flag HABANA_PROFILE=1 and could observe below error when n is quite large (in my case, n>= 60 for 16 MB tensors, A_0_hpu, A_1_hpu, B_hpu). Can you figure out the reason of error?

Internal Error: Received signal - Segmentation fault
Segmentation fault (core dumped)

FYI, the reason I put two mark_step()s is that I can get more clean “Analyzed Nodes” results (profiling results) after adding them. I can get only one node (e.g., “fusedTPCNode_0_0”) after adding them (I am not confident about this. It might be wrong description). Also, profiler seems to not capture the torch.add() if the resulting tensor is not used without mark_step().

Sayantan_S · September 5, 2024, 9:36pm

whats “dtype” and “num”?

And you are on 1.17 ?

juntaek0425 · September 24, 2024, 10:59pm

dtype is pytorch data type such as torch.float.
num is the number of elements in tensors. I set this value to make the tensor size to be 16 MB.
Yes, I used 1.17

Topic		Replies	Views
Linear Layer Inconsistency General Questions pytorch	2	225	April 24, 2024
Gaudi Torch Cummax PyTorch pytorch	4	857	November 14, 2022
Result of torch.argmax with -inf tensor on hpu is different from that of cpu and gpu Feedback & Feature Request pytorch	2	196	July 9, 2024
Graph compile failed when torch.repeat Inference pytorch	3	114	November 3, 2024
Gaudi1 HPU doesn't support long? PyTorch pytorch	11	322	April 4, 2024

Reason for segmentation fault

Related topics