PyTorch model works on CPU/CUDA but not on HPU

Hi Habana team!

I am trying to use a PyTorch model on HPU, but I am getting an error when trying to encode the text input. The same script below works fine on CPU/CUDA. Will I need to change anything else in the model to make it work? I couldn’t find anything else on the PyTorch porting guide.

I have included the debugger file and also the hl-smi_log found in the ~/.habana_logs dir.
I am using the Deep Learning AMI Habana PyTorch 1.7.1 SynapseAI 0.15.4 (Ubuntu 18.04) 20211025 image (ami-061d5e0b81dfa2121).

Any help is very much appreciated =)

import os
import torch
import habana_frameworks.torch.core
from habana_frameworks.torch.utils.library_loader import load_habana_module
load_habana_module()

from models import loader
text_model = loader.load_model('BERT-Distil-40')

_ = text_model.to('hpu') #works on cpu/cuda
with torch.no_grad():
    print(text_model('hello'))

Error:

  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 756, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 550, in forward
    inputs_embeds = self.embeddings(input_ids)  # (bs, seq_length, dim)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 756, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/transformers/models/distilbert/modeling_distilbert.py", line 130, in forward
    word_embeddings = self.word_embeddings(input_ids)  # (bs, max_seq_length, dim)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 756, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: array::at: __n (which is 18446744073709551615) >= _Nm (which is 8)

Hi @gustavozomer, thank you for posting your question. We’ll take a look at your log file, but we don’t see the hl-smi output, can you please post that as well? Also, as mentioned in the other post, you can review the PyTorch Distilbert model from our Model-References page for additional background

Hi @Greg_S!

Many thanks for your update, please find the logs below (I forgot to include them in the original post).

In the model references link you posted, there are only changes in the training script to support HPU, which I understand are necessary. But in my case, I am not even training yet (it is just inference), so why isn’t this working?

The most important question is: why wouldn’t a specific model run out-of-the-box on HPU? Are there any features that are not supported? Any specific layers?

For instance, GPTx models are still not supported (HabanaAI/synapseai-roadmap/issues/30). Why is this the case? Is it only the training? Or does the model need to change in any way?
What do I need to do to make for instance GPT models supported?

Hi @gustavozomer, Thanks for using the Snapshot tool and hl-smi. There are two things that we have noticed:

  1. The input tensor and model being not on the same device. And we cannot print values from hpu tensors directly. The tensor values need to be copied to CPU before printing.
  2. Please make sure you go thru all the steps in the Migration Guide for PyTorch [here].(4. Migration Guide — Gaudi Documentation 0.15.0 documentation)

As far as running models that are not fully optimized or under consideration in the roadmap; our objective is that all models should be functional on Gaudi HPU, models that are not fully optimized for our Graph Complier may have more OPS run on the CPU instead of Gaudi. It is out expectation that they will run.

Thanks for the quick response Greg! The error happened way before the print (which wasn’t a problem anyway, removing it doesn’t work as well). And also it is not related to the incorrect device as my model internally was moving it to HPU. I will try to run some of the examples models from your model references repo and then go back to train my own models to see if I was doing something wrong in the migration.

Hi @gustavozomer

Here is a small experiment i did:


import os
import torch
import habana_frameworks.torch.core
from habana_frameworks.torch.utils.library_loader import load_habana_module
load_habana_module()
from torch import nn

class Embeddings(nn.Module):
    def __init__(self):
        super().__init__()
        self.word_embeddings = nn.Embedding(3, 5)
    def forward(self, input_ids):
        return self.word_embeddings(input_ids)

text_model = Embeddings()

_ = text_model.to('hpu') #works on cpu/cuda
with torch.no_grad():
    intensor = torch.tensor([1, 2])#.to('hpu') ####### uncomment the to('hpu') to make it pass
    res=text_model(intensor) ####failure here if input tensor is not on hpu
    print(res)

If I use intensor = torch.tensor([1, 2]).to(‘hpu’), I get results, but if I have just:
intensor = torch.tensor([1, 2]), i get an error which is very similar to the error you saw originally

Traceback (most recent call last):
  File "forum_test.py", line 25, in <module>
    (text_model(intensor))
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "forum_test.py", line 16, in forward
    return self.word_embeddings(input_ids)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: array::at: __n (which is 18446744073709551615) >= _Nm (which is 8)

Can you please confirm that both the model and the inputs are on HPU?

1 Like