Describe the issue; be as descriptive as possible, you can include things like:
On a devcloud Gaudi2 instance, we the PyTorch image appears to not have Gaudi2 HPU support properly installed.
What is the Details of the Environment
Docker command: docker run -it -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/1.8.0/ubuntu20.04/habanalabs/pytorch-installer-1.13.1 /bin/bash
Simple python test:
root@devcloud:/# python3.8
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> device = torch.device("hpu")
>>> tens = torch.rand(3)
>>> tens
tensor([0.1079, 0.9648, 0.6298])
>>> tens.to(device)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: PyTorch is not linked with support for hpu devices
>>> quit()
hl-smi in docker
root@devcloud:/# hl-smi
±----------------------------------------------------------------------------+
| HL-SMI Version: hl-1.8.0-fw-40.0.0.2 |
| Driver Version: 1.7.1-68c1a21 |
|-------------------------------±---------------------±---------------------+
| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. |
|===============================+======================+======================|
| 0 HL-225 N/A | 0000:33:00.0 N/A | 0 |
| N/A 32C N/A 107W / 600W | 768MiB / 98304MiB | 3% N/A |
|-------------------------------±---------------------±---------------------+
| 1 HL-225 N/A | 0000:34:00.0 N/A | 0 |
| N/A 35C N/A 96W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------±---------------------±---------------------+
| 2 HL-225 N/A | 0000:4d:00.0 N/A | 0 |
| N/A 36C N/A 105W / 600W | 768MiB / 98304MiB | 4% N/A |
|-------------------------------±---------------------±---------------------+
| 3 HL-225 N/A | 0000:4e:00.0 N/A | 0 |
| N/A 32C N/A 110W / 600W | 768MiB / 98304MiB | 3% N/A |
|-------------------------------±---------------------±---------------------+
| 4 HL-225 N/A | 0000:b3:00.0 N/A | 0 |
| N/A 37C N/A 105W / 600W | 768MiB / 98304MiB | 2% N/A |
|-------------------------------±---------------------±---------------------+
| 5 HL-225 N/A | 0000:9b:00.0 N/A | 0 |
| N/A 38C N/A 100W / 600W | 768MiB / 98304MiB | 1% N/A |
|-------------------------------±---------------------±---------------------+
| 6 HL-225 N/A | 0000:b4:00.0 N/A | 0 |
| N/A 33C N/A 110W / 600W | 768MiB / 98304MiB | 3% N/A |
|-------------------------------±---------------------±---------------------+
| 7 HL-225 N/A | 0000:9a:00.0 N/A | 0 |
| N/A 34C N/A 98W / 600W | 768MiB / 98304MiB | 1% N/A |
|-------------------------------±---------------------±---------------------+
| Compute Processes: AIP Memory |
| AIP PID Type Process name Usage |
|=============================================================================|
| 0 N/A N/A N/A N/A |
| 1 N/A N/A N/A N/A |
| 2 N/A N/A N/A N/A |
| 3 N/A N/A N/A N/A |
| 4 N/A N/A N/A N/A |
| 5 N/A N/A N/A N/A |
| 6 N/A N/A N/A N/A |
| 7 N/A N/A N/A N/A |
+=============================================================================+
Is there something else we need to run after docker launch? Is this not the right image for Gaudi2 PyTorch?