I’m getting a strange error when performing a complex index assignment:
RuntimeError: expand(HPUBFloat16Type{[2, 82, 4096]}, size=[265897904, 4096]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (3)implicit = 0
> /fooformers/fooformers.py(81)add_memories()
79 barf_if_nans(new_memories)
80 self.memory = self.memory.clone()
---> 81 self.memory[helper, slots] = new_memories.detach()
82 self.lru[helper, slots] = torch.maximum(
83 self.lru.mean(dim=1),
(That 265897904
number isn’t consistent between reruns.)
Sizes:
ipdb> p self.memory.size()
torch.Size([2, 1024, 4096])
ipdb> p helper.size()
torch.Size([2, 82])
ipdb> p slots.size()
torch.Size([2, 82])
ipdb> p new_memories.size()
torch.Size([2, 82, 4096])
The indices are all well within bounds:
ipdb> p torch.min(helper)
tensor(0, device='hpu:0')
ipdb> p torch.max(helper)
tensor(1, device='hpu:0')
ipdb> p torch.min(slots)
tensor(0, device='hpu:0')
ipdb> p torch.max(slots)
tensor(81, device='hpu:0')
This operation is valid according to the rules torch specifies, and if I send the tensors to the CPU, the operation completes as expected:
ipdb> memory_cpu = self.memory.to("cpu")
ipdb> helper_cpu = helper.to("cpu")
ipdb> slots_cpu = slots.to("cpu")
ipdb> new_memories_cpu = new_memories.to("cpu")
ipdb> memory_cpu[helper_cpu, slots_cpu] = new_memories_cpu
# (no error)
When I try to do the operation on the HPU in ipdb, I get the following backtrace:
ipdb> self.memory[helper, slots] = new_memories
*** RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
Check $HABANA_LOGS/ for details
expand(HPUBFloat16Type{[2, 82, 4096]}, size=[265897904, 4096]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (3)implicit = 0
Exception raised from AllocateAndAddSynapseNode at /npu-stack/pytorch-integration/habana_kernels/tensor_shape_kernels.cpp:691 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f4e4396166c in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7f4e439169f0 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #2: BroadcastOperator::AllocateAndAddSynapseNode(synapse_helpers::graph&, std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::vector<habana::OutputMetaData, std::allocator<habana::OutputMetaData> > const&) + 0x96d (0x7f4ee3f45aed in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #3: habana::IndexPutOperator::AllocateAndAddSynapseNodeNonBoolIndices(synapse_helpers::graph&, std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::vector<habana::OutputMetaData, std::allocator<habana::OutputMetaData> > const&) + 0xb86 (0x7f4ee3d40356 in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #4: habana::IndexPutOperator::AllocateAndAddSynapseNode(synapse_helpers::graph&, std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::vector<habana::OutputMetaData, std::allocator<habana::OutputMetaData> > const&) + 0x17b (0x7f4ee3d4885b in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #5: habana::HabanaLaunchOpPT::BuildSynapseGraph(std::shared_ptr<synapse_helpers::graph>&, bool) + 0x1c6e (0x7f4ede5ac07e in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_backend.so)
frame #6: habana::HabanaLaunchOpPT::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::optional<std::vector<at::Tensor, std::allocator<at::Tensor> > >, bool) + 0x83b (0x7f4ede5badab in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_backend.so)
frame #7: <unknown function> + 0xda0f37 (0x7f4ee3fabf37 in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #8: habana_lazy::exec::HlExec::Launch(std::vector<c10::IValue, std::allocator<c10::IValue> >&, c10::hpu::HPUStream const&, bool) + 0x82c (0x7f4ee3faef3c in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #9: LaunchSyncTensorsGraph(LaunchTensorsInfo&&, LaunchEagerInfo&&, LaunchStreamInfo&&) + 0x4b7 (0x7f4ee3f845e7 in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #10: std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<habana_helpers::ThreadPoolBase<habana_helpers::BlockingQueue>::enqueue<void (&)(LaunchTensorsInfo&&, LaunchEagerInfo&&, LaunchStreamInfo&&), LaunchTensorsInfo, LaunchEagerInfo, LaunchStreamInfo>(void (&)(LaunchTensorsInfo&&, LaunchEagerInfo&&, LaunchStreamInfo&&), LaunchTensorsInfo&&, LaunchEagerInfo&&, LaunchStreamInfo&&)::{lambda()#1}, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) + 0x40 (0x7f4ee3f8b820 in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #11: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 0x2d (0x7f4ee3cbffad in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #12: <unknown function> + 0x114df (0x7f4ef4c524df in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #13: std::__future_base::_Task_state<habana_helpers::ThreadPoolBase<habana_helpers::BlockingQueue>::enqueue<void (&)(LaunchTensorsInfo&&, LaunchEagerInfo&&, LaunchStreamInfo&&), LaunchTensorsInfo, LaunchEagerInfo, LaunchStreamInfo>(void (&)(LaunchTensorsInfo&&, LaunchEagerInfo&&, LaunchStreamInfo&&), LaunchTensorsInfo&&, LaunchEagerInfo&&, LaunchStreamInfo&&)::{lambda()#1}, std::allocator<int>, void ()>::_M_run() + 0x10a (0x7f4ee3f8baea in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #14: habana_helpers::ThreadPoolBase<habana_helpers::BlockingQueue>::executePendingTask(std::packaged_task<void ()>&&) + 0x38 (0x7f4ee3cd0218 in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #15: habana_helpers::ThreadPoolBase<habana_helpers::BlockingQueue>::main_loop() + 0x124 (0x7f4ee3cd0eb4 in /usr/local/lib/python3.8/dist-packages/habana_frameworks/torch/lib/libhabana_pytorch_plugin.so)
frame #16: <unknown function> + 0xd6df4 (0x7f4ef4972df4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #17: <unknown function> + 0x8609 (0x7f4ef4c49609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #18: clone + 0x43 (0x7f4ef4d83133 in /lib/x86_64-linux-gnu/libc.so.6)
(There isn’t any useful additional information in $HABANA_LOGS/
)
This is on a freshly-provisioned Intel Developer Cloud Gaudi 2 instance that is running SynapseAI 1.13.0:
+ ---------------------------------------------------------------------- +
| Version: 1.13.0 |
| Synapse: 6599d95d6 |
| HCL: decb342d |
| MME: 1556117 |
| SCAL: f750a52 |
| Description: HabanaLabs Runtime and GraphCompiler |
| Time: 2024-01-03 07:40:54.874212 |
+ ---------------------------------------------------------------------- +
Is this an issue in the Habana stack, or am I doing something wrong?