Gaudi Torch Cummax

SohrabAndaz · July 12, 2022, 5:36pm

Running torch.cummax on HPU seems to fail. I can run the snippet below on CPU, and it works fine. But when I run on HPU it fails.

torch.cummax(torch.range(10,0,-1)[None,:].repeat(5,1).to('hpu'), dim=1)
*** RuntimeError: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)

On CPU i get the expected results.

torch.cummax(torch.range(10,0,-1)[None,:].repeat(5,1).to('cpu'), dim=1)
torch.return_types.cummax(
values=tensor([[10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.]]),
indices=tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]))

I’m running this on an AWS DL1 Instance with the Habana Gaudi AMI. I’ve also manually installed torch_hpu from GitHub - HabanaAI/Setup_and_Install: Setup and Installation Instructions for Habana binaries, docker image creation

Sayantan_S · July 13, 2022, 10:07pm

Hi @SohrabAndaz

Thank you for pointing out the issue. We’ll update you here when the issue gets fixed in a future release. For now you can maybe move the cummax to CPU, something along the lines of:

x = ... # x is computed on HPU
y = torch.cummax(x.to('cpu'), dim=1) # y is on CPU, so cummax is on CPU
y = y.to('hpu') # moving y back to HPU, so that furthur computations happen on HPU again

SohrabAndaz · July 13, 2022, 10:36pm

I can do that… in general how long should I expect to wait to have this updated?? Weeks? or Months?

-Sohrab Andaz

Sayantan_S · July 13, 2022, 10:42pm

In case the CPU fallback is making things slow, here is another possible workaround, where we implement cummax using max:

import torch
import habana_frameworks.torch.core as htcore

x = torch.tensor([[1,3,2],[2,1,3]])
y = torch.cummax(x, 1) # on cpu

x = x.to('hpu')

rsp = [x.shape[0], x.shape[1], x.shape[1]]
x_tiled = torch.tile(x, [1,x.shape[1]]).reshape(rsp)
print(y)

tril = torch.tril(torch.ones([x.shape[1], x.shape[1]]))
tril_tiled = torch.tile(tril, [x.shape[0], 1]).reshape(rsp)
mul = x_tiled * tril_tiled
result_replacement = torch.max(mul,2)

assert (torch.all(result_replacement.values == y.values))
assert (torch.all(result_replacement.indices == y.indices))

You can get an idea of our past release cadence from the announcements here

Sayantan_S · November 14, 2022, 1:57am

@SohrabAndaz

Recent releases (1.6.1, 1.7) fixes this issue

Topic		Replies	Views
Result of torch.argmax with -inf tensor on hpu is different from that of cpu and gpu Feedback & Feature Request pytorch	2	199	July 9, 2024
Error related to complex torch indexing on HPU only General Questions	7	372	April 2, 2024
Gaudi2 PyTorch Container - Device acquire failed System Setup pytorch	1	1341	February 22, 2023
PRELU RuntimeError for inputs more than 1 dimension Training	3	117	July 23, 2024
Hpu_backend not found on torch.compile PyTorch	2	276	July 11, 2024

Gaudi Torch Cummax

Related topics