Training of torch.nn.embedding failed: loss not decreasing

When using Gaudi to train GNNs using Node2Vec, we found that the loss does not decrease, while the code works well on CPUs.

After some debugging, we identified the problem was at torch.nn.embedding.

Using a toy example, we found that the loss does not decrease when we use torch.nn.embedding on Gaudi, but correctly decreases on CPUs.

See the test code here.

Hi,

Could you try using torch.nn.Parameter and check if loss decreases properly?
Thanks

@sunson Thank you for your suggestion. We have tried it and similar trends have been observed, i.e., the loss on HPU does not decrease properly. See the updated test code here.