ValueError: invalid type: 'torch.hpu.FloatTensor'

Purvang1 · May 30, 2023, 5:48am

I am trying to train YOLOX algorithm on gaudi2. I am getting above error at this operation.
grid = torch.stack((xv, yv), 2).view(1, 1, hsize, wsize, 2).type(dtype)

How can I solve it?

When posting a technical issue, please describe the issue; be as descriptive as possible, you can include things like:
• What was the expected behavior:
• What is the observed result:
• Is the issue consistently reproducible? how long does it take to reproduce:
• If you are using AWS DL1 instance, please report the AMI name that you are using
What is the minimal script/command to reproduce the issue:
Please include any error message or stack trace observed:
Please run the Snapshot for Debug tool and post to the issue
• git clone GitHub - HabanaAI/Snapshot_For_Debug: Snapshot scripts for gathering information about the model and Habana training session for Habana analysis and debug
• touch OUT_DOCKER.txt
• python src/gather_info_docker.py --lite --cmd=<command_script> -s OUT_DOCKER.txt
• post the generated tar file (gather_info_docker.tar.gz) after checking its contents

Sayantan_S · May 30, 2023, 5:53am

Thanks for posting.

Can you try:
grid = torch.stack((xv, yv), 2).view(1, 1, hsize, wsize, 2).type(torch.FloatTensor)

Purvang1 · May 30, 2023, 6:42pm

Thanks. That helped to get pass previous error. now getting following error.

[1,0]: File “/workspace/yolox/utils/boxes.py”, line 102, in bboxes_iou
[1,0]: en = (tl < br).type(torch.FloatTensor).prod(dim=2)
[1,0]: │ │ │ └ <class ‘torch.FloatTensor’>
[1,0]: │ │ └ <module ‘torch’ from ‘/usr/local/lib/python3.8/dist-packages/torch/init.py’>
[1,0]: │ └
[1,0]: └
[1,0]:
[1,0]:RuntimeError: [05-30 15:25:23::141879][R000][41919]FATAL ERROR :: MODULE:SYNHELPER node add failed 1

Sayantan_S · May 30, 2023, 6:43pm

Can you point me to which code you are using? We have a yolox enabled here. Also, what is the SW version you have?

Thanks

Purvang1 · May 31, 2023, 4:20pm

@Sayantan_S
I am trying to integrate horovod into Yolox code and analyzed that when I change optimizer code
from

        self.optimizer = self.exp.get_optimizer(self.args.batch_size, self.args.hpu)

to

        self.optimizer = self.exp.get_optimizer(self.args.batch_size, self.args.hpu)

        self.optimizer = hvd.DistributedOptimizer(self.optimizer, named_parameters=model.named_parameters())

        hvd.broadcast_parameters(model.state_dict(), root_rank=0)
        hvd.broadcast_optimizer_state(self.optimizer, root_rank=0)

I started getting error
ValueError: Tensor type torch.hpu.FloatTensor is not supported.

Purvang1 · May 31, 2023, 4:22pm

Also I am using 8 gaudi2 and
vault.habana.ai/gaudi-docker/1.10.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1:latest
docker image

Purvang1 · May 31, 2023, 4:22pm

@Sayantan_S

Below I have referenced line where error is happening.

https://github.com/HabanaAI/Model-References/blob/3e4f2339d5240d2a83412a8dfb5898307049f08d/PyTorch/computer_vision/detection/yolox/yolox/utils/boxes.py#LL105C1-L105C46

Sayantan_S · June 1, 2023, 5:31pm

From this post, it seems you have been able to run the yolox model?

Purvang1 · June 1, 2023, 6:26pm

No. This error comes when I try to train using horovod instead of pytorch distributed module. Integrating horovod in original_yolox and running it on A100 working fine, but getting mentioned error when I run with gaudi2. post mentioned by you is using code yolox_gaudi2.

Sayantan_S · June 6, 2023, 5:04pm

We support DDP for scaling in pytorch not horovod. Horovod is supported only on tensorflow. See here and here.

Topic		Replies	Views
Gaudi1 HPU doesn't support long? PyTorch pytorch	11	332	April 4, 2024
Gaudi2 PyTorch Container - Device acquire failed System Setup pytorch	1	1348	February 22, 2023
Model.to device faile: "RuntimeError: synStatus=8 [Device not found] Device acquire failed." Training models , pytorch	3	636	March 13, 2024
Gaudi2 slower compared to A100 Training	10	659	June 7, 2023
Trainer killed/Segfault PyTorch	6	644	September 1, 2023

ValueError: invalid type: 'torch.hpu.FloatTensor'

Related topics