When using Gaudi to train GNNs using Node2Vec, we found that the loss does not decrease, while the code works well on CPUs.
After some debugging, we identified the problem was at torch.nn.embedding
.
Using a toy example, we found that the loss does not decrease when we use torch.nn.embedding
on Gaudi, but correctly decreases on CPUs.
See the test code here.