A question about how to use "wrap_in_hpu_graph"

taoshaoyu · April 24, 2023, 4:34pm

Hi, I create a simple code to understand the usage of function “wrap_in_hpu_graph”, the code is as below:

import torch
import habana_frameworks.torch as ht
import habana_frameworks.torch.core as htcore

class A(torch.nn.Module):
def forward(self,x):
b=x[:,torch.tensor((1,2,1,2,1,2,2,1,0,0,0,0)),torch.tensor((0,1,2,3,0,1,1,1,1,1,1,1))]
return b

def foo_1():
sa=A()
sa=ht.hpu.wrap_in_hpu_graph(sa)
sa(torch.arange(30).reshape(2,3,5).to(‘hpu’))

def foo_2():
sa=A()
sa=ht.hpu.wrap_in_hpu_graph(sa)
sa(torch.arange(30).reshape(2,3,5))

when run foo_1(), will indicate error messge “RuntimeError: cpu fallback is not supported during hpu graph capturing”
when run foo_2(), looks like no error message.
So, the question is:

If I use ht.hpu.wrap_in_hpu_graph() to optimize the model, but the input tensor of the model does not execute to(‘hpu’), that is, the input tensor is located at device=‘cpu’, is this usage allowed?

Sayantan_S · April 25, 2023, 5:29pm

Posting the code in triple backticks so that indents are visible

class A(torch.nn.Module):
    def forward(self,x):
        b=x[:,torch.tensor((1,2,1,2,1,2,2,1,0,0,0,0)),torch.tensor((0,1,2,3,0,1,1,1,1,1,1,1))]
        return b

def foo_1():
    sa=A()
    sa=ht.hpu.wrap_in_hpu_graph(sa)
    sa(torch.arange(30).reshape(2,3,5).to('hpu'))

def foo_2():
    sa=A()
    sa=ht.hpu.wrap_in_hpu_graph(sa)
    sa(torch.arange(30).reshape(2,3,5))

foo_1()
foo_2()

Sayantan_S · April 25, 2023, 6:01pm

@taoshaoyu ,

In general the input tensor should be moved to device. Without moving the input tensor on HPU, the operation would happen on CPU.

I can only reproduce the issue on release 1.8, but I do not see the issue on 1.9.

Are you using 1.8? If you can you please move to 1.9 and check if it works for you?

Thanks

Sayantan_S · April 25, 2023, 10:45pm

A furthur note:

the op you seem to do in the model itself is an indexing op.

indexing ops might be dynamic. Here is how you detect dynamic shapes. Basically set GRAPH_VISUALIZATION=1 and run multiple steps with same input shape (we dont want to check for recompiles due to different input shapes as wrap_in_hpu_graph can handle input dynamicity) and check if .graph_dumps is growing.

If your model is dynamic you should not use hpu graphs. So I would first suggest checking if there is dynamicity of ops in the model and then consider wrapping in hpu graph. Note that wrap_in_hpu_graph is able to deal with input dynamicity, so if that is the only kind of dynamicity you have (ie no dynamic op or dynamic control flow), you can wrap in hpu graph. You can read about removing dynamic ops here, specifically these examples.

Also note you can use hpu graphs for training as well as detailed in here, here and here. For training the equivalent of wrap_in_hpu_graph is ModuleCacher

Topic		Replies	Views
AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph PyTorch	4	119	January 19, 2025
PyTorch model works on CPU/CUDA but not on HPU Training pytorch	5	1753	January 19, 2022
Tensors taking time to shift from HPU to CPU Inference pytorch	2	130	July 9, 2024
Error related to complex torch indexing on HPU only General Questions	7	374	April 2, 2024
Hpu_backend not found on torch.compile PyTorch	2	279	July 11, 2024

A question about how to use "wrap_in_hpu_graph"

Hi, I create a simple code to understand the usage of function “wrap_in_hpu_graph”, the code is as below:

Related topics