GCNConv fails with normalization

We encounter

RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread...
[Rank:0] FATAL ERROR :: MODULE:PT_EAGER HabanaLaunchOpPT Run returned exception....
synNodeCreateWithId failed for node: concat with synStatus 1 [Invalid argument]. .
[Rank:0] Habana exception raised from add_node at graph.cpp:507
[Rank:0] Habana exception raised from LaunchRecipe at graph_exec.cpp:558

when conducting GCN training.

  • I use the docker image vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest and then pip install torch_geometric
  • Specifically, torch-geometric==2.6.1

See the error messages here and see the code here.

Workaround: We are able to adapt the code by removing model = torch.compile(model, backend="hpu_backend").

Debugging Information

Seemingly, the problem is caused by the function gcn_norm since it works well when we set normalize = False for GCNConv.

After removing model = torch.compile(model, backend="hpu_backend"), it works even with normalize = True.