More updates: Let me also show a workaround and some debugging information.
Workaround
We are able to adapt the code by
Removing model = torch.compile(model, backend="hpu_backend") and
Moving the evaluation part to CPU (while keeping the training part on HPU).
Debugging Information
The same errors appear even when we only include the training.
The code works well on CPU.
After removing model = torch.compile(model, backend="hpu_backend"), we encounter another error RuntimeError: synStatus=1 [Invalid argument] Node reshape failed.
However, it is possible to run the code with only training after removing model = torch.compile(model, backend="hpu_backend").