AttributeError : 'HabanaParameterWrapper' object has no attribute 'change_device_placement'

Kwanhee_Lee · September 24, 2024, 10:58pm

When I try to copy the weights from layers that are already on hpu to cpu, this error occurs.
Code :

import os
os.environ['PT_HPU_LAZY_MODE'] = '1'
os.environ['LOG_LEVEL_PT_FALLBACK']='1'
os.environ['PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES']='1'
os.environ['LOG_LEVEL_ALL']='3'
os.environ['ENABLE_CONSOLE']='true'
import habana_frameworks.torch.core as htcore
import torch
with torch.no_grad():

Error:

[05:22:59.397349][SYN_API       ][info ][tid:4AD0] + ---------------------------------------------------------------------- +
[05:22:59.397377][SYN_API       ][info ][tid:4AD0] | Version:            1.17.1                                             |
[05:22:59.397383][SYN_API       ][info ][tid:4AD0] | Synapse:            db1a431                                            |
[05:22:59.397387][SYN_API       ][info ][tid:4AD0] | HCL:                1.17.1-a6d0341                                     |
[05:22:59.397391][SYN_API       ][info ][tid:4AD0] | MME:                f1ec30d                                            |
[05:22:59.397395][SYN_API       ][info ][tid:4AD0] | SCAL:               b05f1cf                                            |
[05:22:59.397400][SYN_API       ][info ][tid:4AD0] | Description:        HabanaLabs Runtime and GraphCompiler               |
[05:22:59.397430][SYN_API       ][info ][tid:4AD0] | Time:               2024-09-23 05:22:59.397404                         |
[05:22:59.397438][SYN_API       ][info ][tid:4AD0] + ---------------------------------------------------------------------- +
[05:22:59.422401][KERNEL_DB             ][warning][tid:4AD0] Failed loading version number from libTPCFuser.so
[05:22:59.426857][KERNEL_DB             ][warning][tid:4AD0] Failed loading version number from libTPCFuser.so
[05:22:59.429324][KERNEL_DB             ][warning][tid:4AD0] Failed loading version number from libTPCFuser.so
[05:23:00.475872][TPC_NODE              ][warning][tid:4AD0] Can't access halReader, setting maxNumOfTPCS to 24 
[05:23:00.477920][TPC_NODE              ][warning][tid:4AD0] Can't access halReader, setting maxNumOfTPCS to 64 
[05:23:00.485751][SYN_API       ][info ][tid:4AD0] synInitialize, status 0[synSuccess]
[05:23:00.525206][SCAL][info ][tid:4AD0] +-------------------------------------------------+
[05:23:00.525230][SCAL][info ][tid:4AD0] SCAL Commit SHA1 = b05f1cf
[05:23:00.525233][SCAL][info ][tid:4AD0] SCAL Build Time = Tue Aug 20 03:15:55 PM IDT 2024
[05:23:00.525238][SCAL][info ][tid:4AD0] SCAL loading config from default.json
[05:23:00.525244][SCAL][info ][tid:4AD0] SCAL config Hash = 0x2c64b209f6e3077f
[05:23:00.525247][SCAL][info ][tid:4AD0] +-------------------------------------------------+
[05:23:02.206154][HCL       ][info ][tid:4AD0] Version:	1.17.1-a6d0341
[05:23:02.206324][HL_GCFG][warning][tid:4AD0] setValue: override BOX_TYPE_ID value that already set from observation
[05:23:02.946268][SYN_API       ][info ][tid:4AD0] synDeviceAcquireByDeviceType, status 0[synSuccess]
[05:23:02.946734][PT_SYNHELPER    ][warning][tid:4AD0] /npu-stack/pytorch-integration/backend/synapse_helpers/mem_hlml.cpp:117	Allocated hlml shared memory0x7f3ddebae000
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM       : 2113407792 KB
------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/code/dummy/test.py", line 12, in <module>
    test_conv.weight.data = conv.weight.data
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 32, in __setattr__
    return object.__setattr__(self_, name, value)
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 55, in __torch_function__
    new_args[0].change_device_placement(new_args[1].device)
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 24, in __getattribute__
    return object.__getattribute__(self_, name)
AttributeError: 'HabanaParameterWrapper' object has no attribute 'change_device_placement'
[05:23:04.770297][SYN_API       ][info ][tid:4AD0] synDeviceRelease, status 0[synSuccess]
[05:23:04.779318][SYN_API       ][info ][tid:4AD0] synDestroy, status 0[synSuccess]

System info :

================================================================================
Environment
================================================================================
Device                               gaudi2 
OS                                   ubuntu 
OS version                           22.04  
Log file                             /var/log/habana_logs/install-2024-09-23-14-25-50.log
Release version                      1.17.0-495
Habanalabs server                    vault.habana.ai
Rewrite installer config             no     
Install type                         validate
Python repo URL                      https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
Habanalabs software                  [OK]   
habanalabs-container-runtime=1.17.1-40
habanalabs-dkms=1.17.0-495
habanalabs-firmware=1.17.0-495
habanalabs-firmware-odm=1.17.0-495
habanalabs-firmware-tools=1.17.0-495
habanalabs-graph=1.17.0-495
habanalabs-qual=1.17.0-495
habanalabs-rdma-core=1.17.0-495
habanalabs-thunk=1.17.0-495
================================================================================
System
================================================================================
CPU: 160
Model name: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
MemTotal: 2113407792 kB
Hugepagesize: 2048 kB
================================================================================
OS environment
================================================================================
Package manager                      [apt]  
The required sudo privileges are     [ok]   
Python 3.10                          [OK]   
================================================================================
Basic dependencies
================================================================================
gcc                                  [OK]   
cmake                                [OK]   
lsof                                 [OK]   
curl                                 [OK]   
wget                                 [OK]   
linux-headers-5.15.0-113-generic     [OK]   
ethtool                              [OK]   
libelf-dev                           [OK]   
libbz2-dev                           [OK]   
liblzma-dev                          [OK]   
libibverbs-dev                       [OK]   
librdmacm-dev                        [OK]   
dkms                                 [OK]   
linux-modules-extra-5.15.0-113-generic  [OK]   
================================================================================
PyTorch dependencies
================================================================================
gcc                                  [OK]   
cmake                                [OK]   
lsof                                 [OK]   
curl                                 [OK]   
wget                                 [OK]   
unzip                                [MISSING]
libcurl4                             [OK]   
moreutils                            [MISSING]
iproute2                             [OK]   
libcairo2-dev                        [MISSING]
libglib2.0-dev                       [MISSING]
libselinux1-dev                      [MISSING]
libnuma-dev                          [MISSING]
libpcre2-dev                         [MISSING]
libatlas-base-dev                    [MISSING]
libjpeg-dev                          [MISSING]
liblapack-dev                        [MISSING]
libnuma-dev                          [MISSING]
google-perftools                     [MISSING]
numactl                              [MISSING]
libopenblas-dev                      [MISSING]
Open MPI version 4.1.5 will be install
================================================================================
Installed Habanalabs software
================================================================================
habanalabs-container-runtime=1.17.1-40
habanalabs-dkms=1.17.0-495
habanalabs-firmware=1.17.0-495
habanalabs-firmware-odm=1.17.0-495
habanalabs-firmware-tools=1.17.0-495
habanalabs-graph=1.17.0-495
habanalabs-qual=1.17.0-495
habanalabs-rdma-core=1.17.0-495
habanalabs-thunk=1.17.0-495
================================================================================
Full install log: /var/log/habana_logs/install-2024-09-23-14-25-50.log
================================================================================

Sayantan_S · September 24, 2024, 11:08pm

The code ends with “with torch.no_grad():”, without the actual code. Could you please post what follows torch.no_grad?

Kwanhee_Lee · September 25, 2024, 1:12am

sorry, I accidentally posted without code. here’s the code snippet to reproduce the error above.

import os
os.environ['PT_HPU_LAZY_MODE'] = '1'
os.environ['LOG_LEVEL_PT_FALLBACK'] = '1'
os.environ['PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES'] = '1'
os.environ['LOG_LEVEL_ALL'] = '3'
os.environ['ENABLE_CONSOLE'] = 'true'
import habana_frameworks.torch.core as htcore
import torch
with torch.no_grad():
        conv = torch.nn.Conv2d(3,64,(32,32)).to('hpu')
        test_conv = torch.nn.Conv2d(3,64,(32,32))
        test_conv.weight.data = conv.weight.data
output

I tried some workaround to copy the weight in cpu and move it back to hpu, but did not work.

sunson · October 9, 2024, 3:04am

Hi Kwanhee,

Would you be able to try changing line 12
test_conv.weight.data = conv.weight.data → test_conv.weight = conv.weight

import os
os.environ['PT_HPU_LAZY_MODE'] = '1'
os.environ['LOG_LEVEL_PT_FALLBACK'] = '1'
os.environ['PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES'] = '1'
os.environ['LOG_LEVEL_ALL'] = '3'
os.environ['ENABLE_CONSOLE'] = 'true'
import habana_frameworks.torch.core as htcore
import torch
with torch.no_grad():
        conv = torch.nn.Conv2d(3,64,(32,32)).to('hpu')
        test_conv = torch.nn.Conv2d(3,64,(32,32))
        test_conv.weight = conv.weight


Thanks!

Sayantan_S · October 9, 2024, 3:10am

Hi

Can you please describe the usecase? This seems like a specific unit test intended to repro the issue, but if we understood the usecase, we might suggest some workaround

For example, if you want a nnmodule with 2 conv layers to share the same wts, you could create the nnmodule in CPU, copy wts from one layer to the other and then save the state_dict
Next create a new module and load back the statedict and then move the model to hpu

In pseudo code

m0 = Model() # model on cpu
copy m0.conv0 to m0.conv1
save m0.state_dict

m1 = Model()
load saved statedict to m1
m1 = m1.to('hpu')

Or you might be trying to do somethign else, perhaps we can suggest some alternative if we had more context

Kwanhee_Lee · October 23, 2024, 6:06am

It is a specific unit test indeed, and i encountered this issue when i was trying to implement network pruning methods.

Normally when we try to prune the weights (usually they are on the GPU(HPU), we copy them and calculate with some metric and replace the weight with pruned weights on the fly, and i was using this code snippet to implement pruning metric.

i think the above solution will workout.

thank you for the reply !

Kwanhee_Lee · October 23, 2024, 6:06am

Thank you for the reply. It worked out !

Topic		Replies	Views
AttributeError : 'HabanaParameterWrapper' object has no attribute 'change_device_placement' PyTorch pytorch	1	79	September 24, 2024
PyTorch model works on CPU/CUDA but not on HPU Training pytorch	5	1772	January 19, 2022
Trainer killed/Segfault PyTorch	6	654	September 1, 2023
Error with convolution layers Training pytorch	7	1382	July 13, 2022
HPUStrategy Implementation Error TensorFlow tensorflow	1	723	January 29, 2022

AttributeError : 'HabanaParameterWrapper' object has no attribute 'change_device_placement'

Related topics