I am trying to run A1111/stable-diffusion-webui on Gaudi1. I have successfully implemented the GPU Migration Kit and updated code to push inferencing tasks on to Gaudi hardware (confirmed by hl-smi).
Intermittently inferencing fails due to an empty tensor error. I suspect “empty” really means null, not a tensor initialized with all zeros such as when using torch.empty().
Looking at the error, it seems as if the tensor may not exist in the underlying storage and may be related to the Habana Lazy optimization strategy. As I understand it, tensors are copied from the HPU/GPU back to CPU in certain parts of the workflow, and maybe this is a timing issue. This may also be nuanced by the way A1111 implements image pipelines which does not use StableDiffusionPipeline.
In this use case we are using the DDIM sampler. Using other samplers results in different errors. These errors do not occur when running A1111 on Cuda, and it does seem related to implementing HPUs.
Python error:
RuntimeError: Empty tensor optional
Habana Framework stack trace:
[23:24:18.362864][PT_BRIDGE ][error][tid:11313] /npu-stack/pytorch-integration/habana_lazy/hpu_lazy_tensors.cpp: 962Empty tensor optionalPrepareInputStack
[23:24:18.367603][PT_BRIDGE ][error][tid:11313] backtrace (up to 30)
[23:24:18.367635][PT_BRIDGE ][error][tid:11313] /usr/lib/habanalabs/libhl_logger.so(hl_logger::v1_0::logStackTrace(std::shared_ptr<hl_logger::Logger> const&, int)+0x5c) [0x7fd38ddb73fc]
[23:24:18.367647][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(void hl_logger::v1_5_inline_fmt_compile::logStacktraceHlLogger::LoggerType(HlLogger::LoggerType, int)+0x93) [0x7fd38d0cec43]
[23:24:18.367672][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(PrepareInputStack(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::vector<int, std::allocator >&, std::vector<habana_lazy::ir::Value, std::allocator<habana_lazy::ir::Value> >&, bool, std::vector<std::shared_ptr<habana_lazy::ir::Node>, std::allocator<std::shared_ptr<habana_lazy::ir::Node> > >, bool)+0xd3f) [0x7fd38d99315f]
[23:24:18.367721][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncTensorsGraphInternal(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, bool, bool)+0x164e) [0x7fd38d9965ce]
[23:24:18.367750][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncTensorsGraph(std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, bool, bool)+0x41b) [0x7fd38d9986bb]
[23:24:18.367775][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::SyncLiveTensorsGraph(c10::Device const*, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, bool, bool, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::set<long, std::less, std::allocator >)+0x67e) [0x7fd38d9992ae]
[23:24:18.367800][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::HbLazyTensor::StepMarker(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::shared_ptr<habana_lazy::HbLazyFrontEndInfoToBackend>, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, bool, bool, std::vector<habana_lazy::HbLazyTensor, std::allocator<habana_lazy::HbLazyTensor> >, std::set<long, std::less, std::allocator >)+0x94b) [0x7fd38d999f0b]
[23:24:18.367815][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana_lazy::local_scalar_dense_hpu(at::Tensor const&)+0x4b9) [0x7fd38d7f47f9]
[23:24:18.367830][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(habana::local_scalar_dense(at::Tensor const&)+0x2e4) [0x7fd38d37bed4]
[23:24:18.367845][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so(c10::impl::wrap_kernel_functor_unboxed<c10::impl::detail::WrapFunctionIntoRuntimeFunctor<c10::Scalar ()(at::Tensor const&), c10::Scalar, c10::guts::typelist::typelist<at::Tensor const&> >, c10::Scalar (at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&)+0x27) [0x7fd38d3c00f7]
[23:24:18.367860][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::_local_scalar_dense::redispatch(c10::DispatchKeySet, at::Tensor const&)+0x88) [0x7fd64d9a8db8]
[23:24:18.367871][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x39405f3) [0x7fd64f3405f3]
[23:24:18.367883][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x39406d8) [0x7fd64f3406d8]
[23:24:18.367894][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::_local_scalar_dense::call(at::Tensor const&)+0x138) [0x7fd64da411a8]
[23:24:18.367905][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::native::item(at::Tensor const&)+0x94) [0x7fd64d0fa0a4]
[23:24:18.367916][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(+0x26a4f95) [0x7fd64e0a4f95]
[23:24:18.367927][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so(at::_ops::item::call(at::Tensor const&)+0x138) [0x7fd64d8ade98]
[23:24:18.367938][PT_BRIDGE ][error][tid:11313] /home/ubuntu/habanalabs-venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x3bd5ee) [0x7fd656fbd5ee]
[23:24:18.367953][PT_BRIDGE ][error][tid:11313] python3(+0x15d64e) [0x55f0b248264e]
[23:24:18.367964][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6152) [0x55f0b24738a2]
[23:24:18.367975][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.367991][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6152) [0x55f0b24738a2]
[23:24:18.368002][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368014][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x1981) [0x55f0b246f0d1]
[23:24:18.368026][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368036][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6bd) [0x55f0b246de0d]
[23:24:18.368048][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368062][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x6bd) [0x55f0b246de0d]
[23:24:18.368080][PT_BRIDGE ][error][tid:11313] python3(_PyFunction_Vectorcall+0x7c) [0x55f0b248570c]
[23:24:18.368089][PT_BRIDGE ][error][tid:11313] python3(_PyEval_EvalFrameDefault+0x2b71) [0x55f0b24702c1]
*** Error completing request