We encounter
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread...
[Rank:0] FATAL ERROR :: MODULE:PT_EAGER HabanaLaunchOpPT Run returned exception....
synNodeCreateWithId failed for node: concat with synStatus 1 [Invalid argument]. .
[Rank:0] Habana exception raised from add_node at graph.cpp:507
[Rank:0] Habana exception raised from LaunchRecipe at graph_exec.cpp:558
when conducting GCN training.
- I use the docker image
vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
and thenpip install torch_geometric
- Specifically,
torch-geometric==2.6.1
See the error messages here and see the code here.
Workaround: We are able to adapt the code by removing model = torch.compile(model, backend="hpu_backend")
.
Debugging Information
Seemingly, the problem is caused by the function gcn_norm
since it works well when we set normalize = False
for GCNConv
.
After removing model = torch.compile(model, backend="hpu_backend")
, it works even with normalize = True
.