Transferring kNN results from CPU to HPU breaks back propagation

vezenbu · December 3, 2024, 9:27am

In a GNN training code, we use the kNN algorithm to generate graphs and then operate graph convolution on the kNN graphs.

The kNN algorithm is operated on CPU
Then we transfer the generated edges to HPU
Then we use the edges for graph convolution

However, this seemingly breaks the back propagation and raises the error RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread. See similar errors here.

See the details of the kNN algorithm here, and see the code here.

I use the docker image vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest and then pip install torch_geometric
Specifically, torch-geometric==2.6.1

Topic		Replies	Views
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread PyTorch	2	160	December 3, 2024
RuntimeError: Input sizes must be equal when doing loss.backward() during the training of a GNN PyTorch pytorch	2	55	November 12, 2024
Training of torch.nn.embedding failed: loss not decreasing PyTorch pytorch	2	33	January 2, 2025
PyTorch model works on CPU/CUDA but not on HPU Training pytorch	5	1699	January 19, 2022
A question about how to use "wrap_in_hpu_graph" Inference pytorch	3	627	April 25, 2023

Transferring kNN results from CPU to HPU breaks back propagation

Related topics