Pytorch Empty Tensor error when running Stable Diffusion on optimum-habana

ctodd · November 7, 2023, 6:39pm

I am trying to run A1111/stable-diffusion-webui on Gaudi1. I have successfully implemented the GPU Migration Kit and updated code to push inferencing tasks on to Gaudi hardware (confirmed by hl-smi).

Intermittently inferencing fails due to an empty tensor error. I suspect “empty” really means null, not a tensor initialized with all zeros such as when using torch.empty().

Looking at the error, it seems as if the tensor may not exist in the underlying storage and may be related to the Habana Lazy optimization strategy. As I understand it, tensors are copied from the HPU/GPU back to CPU in certain parts of the workflow, and maybe this is a timing issue. This may also be nuanced by the way A1111 implements image pipelines which does not use StableDiffusionPipeline.

In this use case we are using the DDIM sampler. Using other samplers results in different errors. These errors do not occur when running A1111 on Cuda, and it does seem related to implementing HPUs.

Python error:

RuntimeError: Empty tensor optional

Habana Framework stack trace:

[23:24:18.362864][PT_BRIDGE [23:24:18.367603][PT_BRIDGE [23:24:18.367635][PT_BRIDGE [23:24:18.367647][PT_BRIDGE [23:24:18.367672][PT_BRIDGE [23:24:18.367721][PT_BRIDGE [23:24:18.367750][PT_BRIDGE [23:24:18.367775][PT_BRIDGE [23:24:18.367800][PT_BRIDGE [23:24:18.367815][PT_BRIDGE [23:24:18.367830][PT_BRIDGE [23:24:18.367845][PT_BRIDGE [23:24:18.367860][PT_BRIDGE [23:24:18.367871][PT_BRIDGE [23:24:18.367883][PT_BRIDGE [23:24:18.367894][PT_BRIDGE [23:24:18.367905][PT_BRIDGE [23:24:18.367916][PT_BRIDGE [23:24:18.367927][PT_BRIDGE [23:24:18.367938][PT_BRIDGE [23:24:18.367953][PT_BRIDGE [23:24:18.367964][PT_BRIDGE [23:24:18.367975][PT_BRIDGE [23:24:18.367991][PT_BRIDGE [23:24:18.368002][PT_BRIDGE [23:24:18.368014][PT_BRIDGE [23:24:18.368026][PT_BRIDGE [23:24:18.368036][PT_BRIDGE [23:24:18.368048][PT_BRIDGE [23:24:18.368062][PT_BRIDGE [23:24:18.368080][PT_BRIDGE [23:24:18.368089][PT_BRIDGE *** Error completing request ][error][tid:11313] /npu-stack/pytorch-integration/habana_lazy/hpu_lazy_tensors.cpp: 962Empty tensor optionalPrepareInputStack
][error][tid:11313] backtrace (up to 30)
][error][tid:11313] /usr/lib/habanalabs/libhl_logger.so(hl_logger::v1_0::logStackTrace(std::shared_ptr<hl_logger::Logger> const&, int)+0x5c) [0x7fd38ddb73fc]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(void hl_logger::v1_5_inline_fmt_compile::logStacktraceHlLogger::LoggerType(HlLogger::LoggerType, int)+0x93) [0x7fd38d0cec43]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(PrepareInputStack(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::vector<int, std::allocator >&, std::vector<habana_lazy::ir::Value, std::allocator<habana_lazy::ir::Value> >&, bool, std::vector<std::shared_ptr<habana_lazy::ir::Node>, std::allocator<std::shared_ptr<habana_lazy::ir::Node> > >, bool)+0xd3f) [0x7fd38d99315f]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncTensorsGraphInternal(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, bool, bool)+0x164e) [0x7fd38d9965ce]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncTensorsGraph(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, bool, bool)+0x41b) [0x7fd38d9986bb]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncLiveTensorsGraph(c10::Device const*, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, bool, bool, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::set<long, std::less, std::allocator >)+0x67e) [0x7fd38d9992ae]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::StepMarker(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, bool, bool, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::set<long, std::less, std::allocator >)+0x94b) [0x7fd38d999f0b]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::local_scalar_dense_hpu(at::Tensor const&)+0x4b9) [0x7fd38d7f47f9]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana::local_scalar_dense(at::Tensor const&)+0x2e4) [0x7fd38d37bed4]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(c10::impl::wrap_kernel_functor_unboxed<c10::impl::detail::WrapFunctionIntoRuntimeFunctor<c10::Scalar ()(at::Tensor const&), c10::Scalar, c10::guts::typelist::typelist<at::Tensor const&> >, c10::Scalar (at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&)+0x27) [0x7fd38d3c00f7]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::_local_scalar_dense::redispatch(c10::DispatchKeySet, at::Tensor const&)+0x88) [0x7fd64d9a8db8]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x39405f3) [0x7fd64f3405f3]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x39406d8) [0x7fd64f3406d8]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::_local_scalar_dense::call(at::Tensor const&)+0x138) [0x7fd64da411a8]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::native::item(at::Tensor const&)+0x94) [0x7fd64d0fa0a4]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x26a4f95) [0x7fd64e0a4f95]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::item::call(at::Tensor const&)+0x138) [0x7fd64d8ade98]
][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x3bd5ee) [0x7fd656fbd5ee]
][error][tid:11313] python3(+0x15d64e) [0x55f0b248264e]
][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6152) [0x55f0b24738a2]
][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6152) [0x55f0b24738a2]
][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x1981) [0x55f0b246f0d1]
][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6bd) [0x55f0b246de0d]
][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6bd) [0x55f0b246de0d]
][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x2b71) [0x55f0b24702c1]

Greg_S · November 7, 2023, 6:55pm

Hi Chris, can you say more about your setup? I’m told you are running on AWS, so what AMI and Synapse AI version are you using? To be on the latest, you can use the Habana DL base AMI with Synapse 1.12.0 and then use the Habana PyTorch 1.12.0 Docker image to run your experiment.

Sayantan_S · November 7, 2023, 7:42pm

The heading says “optimum-habana” but you also mention GPU migration toolkit.

Does your codebase use optimum-habana?

ctodd · November 7, 2023, 7:55pm

I’m using the official Habana AMI with Ubuntu 22.04, however I’ve used the habana-installer.sh to install the latest drivers and tools, including Synapse 1.12.0.

I am not using the Docker image, I’m running everything on the host OS using Python 3.10, and Pytorch 2.0.1 which is supported per the 1.12.0 docs

Is there some reason why I need to use a Docker container or pytorch 1.x to run this workload?

Is there something I’ve potentially missed in the Known Limitations (couldn’t include a third link in this post) which could be the cause of this issue?

Greg_S · November 7, 2023, 8:26pm

you are ok. You can run the PyTorch installer on top of the Habana DL AMI. Can you please answer Sayantan’s question… are you installing optimum[habana] in this case?

may just be best to pip list and send the list.

ctodd · November 7, 2023, 8:50pm

Apologies, I have optimum-habana installed (was benchmarking using the Stable Diffusion example script from that repo), but I am using the GPU Migration kit in this case:

~/habanalabs-venv$ python --version
Python 3.10.12
~/habanalabs-venv$ pip list
Package Version

absl-py 2.0.0
accelerate 0.21.0
addict 2.4.0
aenum 3.1.15
aiofiles 23.2.1
aiohttp 3.8.6
aiosignal 1.3.1
altair 5.1.2
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 3.7.1
apex 0.1
arrow 1.3.0
async-timeout 4.0.3
attrs 23.1.0
av 9.2.0
backoff 2.2.1
basicsr 1.4.2
beautifulsoup4 4.12.2
blendmodes 2022
blessed 1.20.0
boltons 23.0.0
boto3 1.28.73
botocore 1.31.73
cachetools 5.3.2
certifi 2023.7.22
cffi 1.15.1
cfgv 3.4.0
charset-normalizer 3.3.1
clean-fid 0.1.35
click 8.1.7
clip 1.0
cmake 3.27.7
coloredlogs 15.0.1
contextlib2 21.6.0
contourpy 1.1.1
croniter 1.4.1
cycler 0.12.1
datasets 2.14.6
dateutils 0.6.12
deepdiff 6.6.1
deepspeed 0.9.4+hpu.synapse.v1.12.0
Deprecated 1.2.14
deprecation 2.1.0
diffusers 0.21.4
dill 0.3.7
distlib 0.3.7
einops 0.4.1
exceptiongroup 1.1.3
expecttest 0.1.6
facexlib 0.3.0
fastapi 0.94.0
ffmpy 0.3.1
filelock 3.13.0
filterpy 1.4.5
flatbuffers 23.5.26
fonttools 4.43.1
frozenlist 1.4.0
fsspec 2023.10.0
ftfy 6.1.1
future 0.18.3
gdown 4.7.1
gfpgan 1.3.8
gitdb 4.0.11
GitPython 3.1.32
google-auth 2.23.3
google-auth-oauthlib 1.1.0
gradio 3.41.2
gradio_client 0.5.0
grpcio 1.59.0
h11 0.12.0
habana-gpu-migration 1.12.1.10
habana-media-loader 1.12.1.10
habana-pyhlml 1.12.1.10
habana-torch-dataloader 1.12.1.10
habana-torch-plugin 1.12.1.10
hjson 3.1.0
httpcore 0.15.0
httpx 0.24.1
huggingface-hub 0.17.3
humanfriendly 10.0
identify 2.5.30
idna 3.4
imageio 2.31.6
importlib-metadata 6.8.0
importlib-resources 6.1.0
inflection 0.5.1
inquirer 3.1.3
intel-openmp 2023.2.0
itsdangerous 2.1.2
Jinja2 3.1.2
jmespath 1.0.1
joblib 1.3.2
jsonmerge 1.8.0
jsonschema 4.19.1
jsonschema-specifications 2023.7.1
kiwisolver 1.4.5
kornia 0.6.7
lark 1.1.2
lazy_loader 0.3
lightning 2.0.6
lightning-cloud 0.5.44
lightning-habana 1.0.1
lightning-utilities 0.9.0
llvmlite 0.41.1
lmdb 1.4.1
lpips 0.1.4
Markdown 3.5
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.0
mdurl 0.1.2
mkl 2023.1.0
mkl-include 2023.1.0
mpi4py 3.1.4
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
networkx 3.2
neural-compressor 2.3.1
ninja 1.11.1.1
nodeenv 1.8.0
numba 0.58.1
numpy 1.23.5
oauthlib 3.2.2
omegaconf 2.2.3
onnx 1.15.0
onnxruntime 1.14.1
open-clip-torch 2.20.0
opencv-python 4.8.1.78
opencv-python-headless 4.8.1.78
optimum 1.13.2
optimum-habana 1.9.0.dev0
optimum-intel 1.11.0
ordered-set 4.1.0
orjson 3.9.10
packaging 23.2
pandas 2.0.1
pathspec 0.11.2
perfetto 0.7.0
piexif 1.1.3
Pillow 9.5.0
Pillow-SIMD 7.0.0.post3
pip 23.3.1
platformdirs 3.11.0
pre-commit 3.3.3
prettytable 3.9.0
protobuf 3.20.3
psutil 5.9.5
py-cpuinfo 9.0.0
pyarrow 13.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pybind11 2.10.4
pycocotools 2.0.7
pycparser 2.21
pydantic 1.10.13
pydantic_core 2.3.0
pydub 0.25.1
Pygments 2.16.1
PyJWT 2.8.0
pynvml 8.0.4
pyparsing 3.1.1
PySocks 1.7.1
python-dateutil 2.8.2
python-editor 1.0.4
python-multipart 0.0.6
pytorch-lightning 1.9.4
pytz 2023.3.post1
PyWavelets 1.4.1
PyYAML 6.0
readchar 4.0.5
realesrgan 0.3.0
referencing 0.30.2
regex 2023.5.5
requests 2.31.0
requests-oauthlib 1.3.1
resize-right 0.0.2
rich 13.6.0
rpds-py 0.10.6
rsa 4.9
s3transfer 0.7.0
safetensors 0.3.1
schema 0.7.5
scikit-image 0.21.0
scikit-learn 1.3.2
scipy 1.11.3
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 68.2.2
six 1.16.0
smmap 5.0.1
sniffio 1.3.0
soupsieve 2.5
starlette 0.26.1
starsessions 1.3.0
sympy 1.12
tb-nightly 2.16.0a20231027
tbb 2021.10.0
tdqm 0.0.1
tensorboard 2.11.2
tensorboard-data-server 0.7.2
tensorboard-plugin-wit 1.8.1
threadpoolctl 3.2.0
tifffile 2023.9.26
timm 0.9.2
tokenizers 0.13.3
tomesd 0.1.3
tomli 2.0.1
toolz 0.12.0
torch 2.0.1a0+gitf520939
torch-tb-profiler 0.4.0
torchaudio 2.0.1+3b40834
torchdata 0.6.1+e1feeb2
torchdiffeq 0.2.3
torchmetrics 1.2.0
torchsde 0.2.5
torchtext 0.15.2a0+4571036
torchvision 0.15.1a0+42759b1
tqdm 4.66.1
traitlets 5.12.0
trampoline 0.1.2
transformers 4.30.2
types-python-dateutil 2.8.19.14
typing_extensions 4.8.0
tzdata 2023.3
urllib3 1.26.18
uvicorn 0.23.2
virtualenv 20.24.6
wcwidth 0.2.8
websocket-client 1.6.4
websockets 11.0.3
Werkzeug 3.0.1
wheel 0.41.2
wrapt 1.15.0
xxhash 3.4.1
yamllint 1.32.0
yapf 0.40.2
yarl 1.9.2
zipp 3.17.0

Greg_S · November 7, 2023, 11:03pm

hi Chris, do you need to run this specific stable diffusion model? It would be best to just start with the SD model from the Gaudi Model References https://github.com/HabanaAI/Model-References/tree/master/PyTorch/generative_models/stable-diffusion-v-2-1. You can follow the instructions in the README.

Additionally, there’s a Jupyter notebook from our workshop to demo this more visually: https://github.com/HabanaAI/Gaudi2-Workshop/blob/main/PyTorch-Inference/stable_diffusion_v_2_1.ipynb and the associated video to see it running (start at 16:50)

skaulintel · November 8, 2023, 12:16am

Hi Chris,

Are you using the following repository GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI? Could you provide a diff/branch with all code changes you might have made including gpu migration? That would make it easier to reproduce your error on our end, thank you.

Shiv

ctodd · November 14, 2023, 2:28am

If I’m not mistaken, the model is not optimized for Habana, it’s direct from StabilityAI. It’s the configs which are optimized.

I was not aware of these repositories and they are helpful, however the goal of this particular project requires the use of A1111. I’m not opposed to implementing some of this code if needed (and there’s a clear performance benefit) however it’s obviously more convenient to make minimal changes if we can drop in code that shifts the workload to HPU.

Thanks for the support!

ctodd · November 14, 2023, 2:28am

Here’s a diff

https://github.com/CloudBrigade/stable-diffusion-webui/commit/385309d97b7cb905e637fe17e6dc15f998c5c586

Topic		Replies	Views
Graph compile failed error when running txt2image.py from Habana Model-References repo Inference	3	400	November 28, 2023
Running optimum-habana sample on gaudi FAQ pytorch	2	269	June 27, 2024
Trainer killed/Segfault PyTorch	6	644	September 1, 2023
RuntimeError: synStatus=8 [Device not found] Device acquire failed General Questions	1	921	August 14, 2023
Gaudi2 PyTorch Container - Device acquire failed System Setup pytorch	1	1348	February 22, 2023

Pytorch Empty Tensor error when running Stable Diffusion on optimum-habana

Related topics