Pytorch Empty Tensor error when running Stable Diffusion on optimum-habana

I am trying to run A1111/stable-diffusion-webui on Gaudi1. I have successfully implemented the GPU Migration Kit and updated code to push inferencing tasks on to Gaudi hardware (confirmed by hl-smi).

Intermittently inferencing fails due to an empty tensor error. I suspect “empty” really means null, not a tensor initialized with all zeros such as when using torch.empty().

Looking at the error, it seems as if the tensor may not exist in the underlying storage and may be related to the Habana Lazy optimization strategy. As I understand it, tensors are copied from the HPU/GPU back to CPU in certain parts of the workflow, and maybe this is a timing issue. This may also be nuanced by the way A1111 implements image pipelines which does not use StableDiffusionPipeline.

In this use case we are using the DDIM sampler. Using other samplers results in different errors. These errors do not occur when running A1111 on Cuda, and it does seem related to implementing HPUs.

Python error:

RuntimeError: Empty tensor optional

Habana Framework stack trace:

[23:24:18.362864][PT_BRIDGE ][error][tid:11313] /npu-stack/pytorch-integration/habana_lazy/hpu_lazy_tensors.cpp: 962Empty tensor optionalPrepareInputStack
[23:24:18.367603][PT_BRIDGE ][error][tid:11313] backtrace (up to 30)
[23:24:18.367635][PT_BRIDGE ][error][tid:11313] /usr/lib/habanalabs/libhl_logger.so(hl_logger::v1_0::logStackTrace(std::shared_ptr<hl_logger::Logger> const&, int)+0x5c) [0x7fd38ddb73fc]
[23:24:18.367647][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(void hl_logger::v1_5_inline_fmt_compile::logStacktraceHlLogger::LoggerType(HlLogger::LoggerType, int)+0x93) [0x7fd38d0cec43]
[23:24:18.367672][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(PrepareInputStack(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::vector<int, std::allocator >&, std::vector<habana_lazy::ir::Value, std::allocator<habana_lazy::ir::Value> >&, bool, std::vector<std::shared_ptr<habana_lazy::ir::Node>, std::allocator<std::shared_ptr<habana_lazy::ir::Node> > >, bool)+0xd3f) [0x7fd38d99315f]
[23:24:18.367721][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncTensorsGraphInternal(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, bool, bool)+0x164e) [0x7fd38d9965ce]
[23:24:18.367750][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncTensorsGraph(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >
, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, bool, bool)+0x41b) [0x7fd38d9986bb]
[23:24:18.367775][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncLiveTensorsGraph(c10::Device const*, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, bool, bool, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::set<long, std::less, std::allocator >)+0x67e) [0x7fd38d9992ae]
[23:24:18.367800][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::StepMarker(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, bool, bool, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::set<long, std::less, std::allocator >)+0x94b) [0x7fd38d999f0b]
[23:24:18.367815][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::local_scalar_dense_hpu(at::Tensor const&)+0x4b9) [0x7fd38d7f47f9]
[23:24:18.367830][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana::local_scalar_dense(at::Tensor const&)+0x2e4) [0x7fd38d37bed4]
[23:24:18.367845][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(c10::impl::wrap_kernel_functor_unboxed
<c10::impl::detail::WrapFunctionIntoRuntimeFunctor
<c10::Scalar ()(at::Tensor const&), c10::Scalar, c10::guts::typelist::typelist<at::Tensor const&> >, c10::Scalar (at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&)+0x27) [0x7fd38d3c00f7]
[23:24:18.367860][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::_local_scalar_dense::redispatch(c10::DispatchKeySet, at::Tensor const&)+0x88) [0x7fd64d9a8db8]
[23:24:18.367871][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x39405f3) [0x7fd64f3405f3]
[23:24:18.367883][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x39406d8) [0x7fd64f3406d8]
[23:24:18.367894][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::_local_scalar_dense::call(at::Tensor const&)+0x138) [0x7fd64da411a8]
[23:24:18.367905][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::native::item(at::Tensor const&)+0x94) [0x7fd64d0fa0a4]
[23:24:18.367916][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x26a4f95) [0x7fd64e0a4f95]
[23:24:18.367927][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::item::call(at::Tensor const&)+0x138) [0x7fd64d8ade98]
[23:24:18.367938][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x3bd5ee) [0x7fd656fbd5ee]
[23:24:18.367953][PT_BRIDGE ][error][tid:11313] python3(+0x15d64e) [0x55f0b248264e]
[23:24:18.367964][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6152) [0x55f0b24738a2]
[23:24:18.367975][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.367991][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6152) [0x55f0b24738a2]
[23:24:18.368002][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368014][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x1981) [0x55f0b246f0d1]
[23:24:18.368026][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368036][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6bd) [0x55f0b246de0d]
[23:24:18.368048][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368062][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6bd) [0x55f0b246de0d]
[23:24:18.368080][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368089][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x2b71) [0x55f0b24702c1]
*** Error completing request

Hi Chris, can you say more about your setup? I’m told you are running on AWS, so what AMI and Synapse AI version are you using? To be on the latest, you can use the Habana DL base AMI with Synapse 1.12.0 and then use the Habana PyTorch 1.12.0 Docker image to run your experiment.

image

The heading says “optimum-habana” but you also mention GPU migration toolkit.

Does your codebase use optimum-habana?

I’m using the official Habana AMI with Ubuntu 22.04, however I’ve used the habana-installer.sh to install the latest drivers and tools, including Synapse 1.12.0.

I am not using the Docker image, I’m running everything on the host OS using Python 3.10, and Pytorch 2.0.1 which is supported per the 1.12.0 docs

Is there some reason why I need to use a Docker container or pytorch 1.x to run this workload?

Is there something I’ve potentially missed in the Known Limitations (couldn’t include a third link in this post) which could be the cause of this issue?

you are ok. You can run the PyTorch installer on top of the Habana DL AMI. Can you please answer Sayantan’s question… are you installing optimum[habana] in this case?

may just be best to pip list and send the list.

Apologies, I have optimum-habana installed (was benchmarking using the Stable Diffusion example script from that repo), but I am using the GPU Migration kit in this case:

~/habanalabs-venv$ python --version
Python 3.10.12
~/habanalabs-venv$ pip list
Package Version


absl-py 2.0.0
accelerate 0.21.0
addict 2.4.0
aenum 3.1.15
aiofiles 23.2.1
aiohttp 3.8.6
aiosignal 1.3.1
altair 5.1.2
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 3.7.1
apex 0.1
arrow 1.3.0
async-timeout 4.0.3
attrs 23.1.0
av 9.2.0
backoff 2.2.1
basicsr 1.4.2
beautifulsoup4 4.12.2
blendmodes 2022
blessed 1.20.0
boltons 23.0.0
boto3 1.28.73
botocore 1.31.73
cachetools 5.3.2
certifi 2023.7.22
cffi 1.15.1
cfgv 3.4.0
charset-normalizer 3.3.1
clean-fid 0.1.35
click 8.1.7
clip 1.0
cmake 3.27.7
coloredlogs 15.0.1
contextlib2 21.6.0
contourpy 1.1.1
croniter 1.4.1
cycler 0.12.1
datasets 2.14.6
dateutils 0.6.12
deepdiff 6.6.1
deepspeed 0.9.4+hpu.synapse.v1.12.0
Deprecated 1.2.14
deprecation 2.1.0
diffusers 0.21.4
dill 0.3.7
distlib 0.3.7
einops 0.4.1
exceptiongroup 1.1.3
expecttest 0.1.6
facexlib 0.3.0
fastapi 0.94.0
ffmpy 0.3.1
filelock 3.13.0
filterpy 1.4.5
flatbuffers 23.5.26
fonttools 4.43.1
frozenlist 1.4.0
fsspec 2023.10.0
ftfy 6.1.1
future 0.18.3
gdown 4.7.1
gfpgan 1.3.8
gitdb 4.0.11
GitPython 3.1.32
google-auth 2.23.3
google-auth-oauthlib 1.1.0
gradio 3.41.2
gradio_client 0.5.0
grpcio 1.59.0
h11 0.12.0
habana-gpu-migration 1.12.1.10
habana-media-loader 1.12.1.10
habana-pyhlml 1.12.1.10
habana-torch-dataloader 1.12.1.10
habana-torch-plugin 1.12.1.10
hjson 3.1.0
httpcore 0.15.0
httpx 0.24.1
huggingface-hub 0.17.3
humanfriendly 10.0
identify 2.5.30
idna 3.4
imageio 2.31.6
importlib-metadata 6.8.0
importlib-resources 6.1.0
inflection 0.5.1
inquirer 3.1.3
intel-openmp 2023.2.0
itsdangerous 2.1.2
Jinja2 3.1.2
jmespath 1.0.1
joblib 1.3.2
jsonmerge 1.8.0
jsonschema 4.19.1
jsonschema-specifications 2023.7.1
kiwisolver 1.4.5
kornia 0.6.7
lark 1.1.2
lazy_loader 0.3
lightning 2.0.6
lightning-cloud 0.5.44
lightning-habana 1.0.1
lightning-utilities 0.9.0
llvmlite 0.41.1
lmdb 1.4.1
lpips 0.1.4
Markdown 3.5
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.0
mdurl 0.1.2
mkl 2023.1.0
mkl-include 2023.1.0
mpi4py 3.1.4
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
networkx 3.2
neural-compressor 2.3.1
ninja 1.11.1.1
nodeenv 1.8.0
numba 0.58.1
numpy 1.23.5
oauthlib 3.2.2
omegaconf 2.2.3
onnx 1.15.0
onnxruntime 1.14.1
open-clip-torch 2.20.0
opencv-python 4.8.1.78
opencv-python-headless 4.8.1.78
optimum 1.13.2
optimum-habana 1.9.0.dev0
optimum-intel 1.11.0
ordered-set 4.1.0
orjson 3.9.10
packaging 23.2
pandas 2.0.1
pathspec 0.11.2
perfetto 0.7.0
piexif 1.1.3
Pillow 9.5.0
Pillow-SIMD 7.0.0.post3
pip 23.3.1
platformdirs 3.11.0
pre-commit 3.3.3
prettytable 3.9.0
protobuf 3.20.3
psutil 5.9.5
py-cpuinfo 9.0.0
pyarrow 13.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pybind11 2.10.4
pycocotools 2.0.7
pycparser 2.21
pydantic 1.10.13
pydantic_core 2.3.0
pydub 0.25.1
Pygments 2.16.1
PyJWT 2.8.0
pynvml 8.0.4
pyparsing 3.1.1
PySocks 1.7.1
python-dateutil 2.8.2
python-editor 1.0.4
python-multipart 0.0.6
pytorch-lightning 1.9.4
pytz 2023.3.post1
PyWavelets 1.4.1
PyYAML 6.0
readchar 4.0.5
realesrgan 0.3.0
referencing 0.30.2
regex 2023.5.5
requests 2.31.0
requests-oauthlib 1.3.1
resize-right 0.0.2
rich 13.6.0
rpds-py 0.10.6
rsa 4.9
s3transfer 0.7.0
safetensors 0.3.1
schema 0.7.5
scikit-image 0.21.0
scikit-learn 1.3.2
scipy 1.11.3
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 68.2.2
six 1.16.0
smmap 5.0.1
sniffio 1.3.0
soupsieve 2.5
starlette 0.26.1
starsessions 1.3.0
sympy 1.12
tb-nightly 2.16.0a20231027
tbb 2021.10.0
tdqm 0.0.1
tensorboard 2.11.2
tensorboard-data-server 0.7.2
tensorboard-plugin-wit 1.8.1
threadpoolctl 3.2.0
tifffile 2023.9.26
timm 0.9.2
tokenizers 0.13.3
tomesd 0.1.3
tomli 2.0.1
toolz 0.12.0
torch 2.0.1a0+gitf520939
torch-tb-profiler 0.4.0
torchaudio 2.0.1+3b40834
torchdata 0.6.1+e1feeb2
torchdiffeq 0.2.3
torchmetrics 1.2.0
torchsde 0.2.5
torchtext 0.15.2a0+4571036
torchvision 0.15.1a0+42759b1
tqdm 4.66.1
traitlets 5.12.0
trampoline 0.1.2
transformers 4.30.2
types-python-dateutil 2.8.19.14
typing_extensions 4.8.0
tzdata 2023.3
urllib3 1.26.18
uvicorn 0.23.2
virtualenv 20.24.6
wcwidth 0.2.8
websocket-client 1.6.4
websockets 11.0.3
Werkzeug 3.0.1
wheel 0.41.2
wrapt 1.15.0
xxhash 3.4.1
yamllint 1.32.0
yapf 0.40.2
yarl 1.9.2
zipp 3.17.0

hi Chris, do you need to run this specific stable diffusion model? It would be best to just start with the SD model from the Gaudi Model References https://github.com/HabanaAI/Model-References/tree/master/PyTorch/generative_models/stable-diffusion-v-2-1. You can follow the instructions in the README.

Additionally, there’s a Jupyter notebook from our workshop to demo this more visually: https://github.com/HabanaAI/Gaudi2-Workshop/blob/main/PyTorch-Inference/stable_diffusion_v_2_1.ipynb and the associated video to see it running (start at 16:50)

Hi Chris,

Are you using the following repository GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI? Could you provide a diff/branch with all code changes you might have made including gpu migration? That would make it easier to reproduce your error on our end, thank you.

Shiv

If I’m not mistaken, the model is not optimized for Habana, it’s direct from StabilityAI. It’s the configs which are optimized.

I was not aware of these repositories and they are helpful, however the goal of this particular project requires the use of A1111. I’m not opposed to implementing some of this code if needed (and there’s a clear performance benefit) however it’s obviously more convenient to make minimal changes if we can drop in code that shifts the workload to HPU.

Thanks for the support!

Here’s a diff

https://github.com/CloudBrigade/stable-diffusion-webui/commit/385309d97b7cb905e637fe17e6dc15f998c5c586