ValueError: invalid type: 'torch.hpu.FloatTensor'

I am trying to train YOLOX algorithm on gaudi2. I am getting above error at this operation.
grid = torch.stack((xv, yv), 2).view(1, 1, hsize, wsize, 2).type(dtype)

How can I solve it?

When posting a technical issue, please describe the issue; be as descriptive as possible, you can include things like:
• What was the expected behavior:
• What is the observed result:
• Is the issue consistently reproducible? how long does it take to reproduce:
• If you are using AWS DL1 instance, please report the AMI name that you are using
What is the minimal script/command to reproduce the issue:
Please include any error message or stack trace observed:
Please run the Snapshot for Debug tool and post to the issue
• git clone GitHub - HabanaAI/Snapshot_For_Debug: Snapshot scripts for gathering information about the model and Habana training session for Habana analysis and debug
• touch OUT_DOCKER.txt
• python src/gather_info_docker.py --lite --cmd=<command_script> -s OUT_DOCKER.txt
• post the generated tar file (gather_info_docker.tar.gz) after checking its contents

Thanks for posting.

Can you try:
grid = torch.stack((xv, yv), 2).view(1, 1, hsize, wsize, 2).type(torch.FloatTensor)

Thanks. That helped to get pass previous error. now getting following error.

[1,0]: File “/workspace/yolox/utils/boxes.py”, line 102, in bboxes_iou
[1,0]: en = (tl < br).type(torch.FloatTensor).prod(dim=2)
[1,0]: │ │ │ └ <class ‘torch.FloatTensor’>
[1,0]: │ │ └ <module ‘torch’ from ‘/usr/local/lib/python3.8/dist-packages/torch/init.py’>
[1,0]: │ └
[1,0]: └
[1,0]:
[1,0]:RuntimeError: [05-30 15:25:23::141879][R000][41919]FATAL ERROR :: MODULE:SYNHELPER node add failed 1

Can you point me to which code you are using? We have a yolox enabled here. Also, what is the SW version you have?

Thanks

@Sayantan_S
I am trying to integrate horovod into Yolox code and analyzed that when I change optimizer code
from

        self.optimizer = self.exp.get_optimizer(self.args.batch_size, self.args.hpu)

to

        self.optimizer = self.exp.get_optimizer(self.args.batch_size, self.args.hpu)

        self.optimizer = hvd.DistributedOptimizer(self.optimizer, named_parameters=model.named_parameters())

        hvd.broadcast_parameters(model.state_dict(), root_rank=0)
        hvd.broadcast_optimizer_state(self.optimizer, root_rank=0)

I started getting error
ValueError: Tensor type torch.hpu.FloatTensor is not supported.

Also I am using 8 gaudi2 and
vault.habana.ai/gaudi-docker/1.10.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1:latest
docker image

@Sayantan_S

Below I have referenced line where error is happening.

https://github.com/HabanaAI/Model-References/blob/3e4f2339d5240d2a83412a8dfb5898307049f08d/PyTorch/computer_vision/detection/yolox/yolox/utils/boxes.py#LL105C1-L105C46

From this post, it seems you have been able to run the yolox model?

No. This error comes when I try to train using horovod instead of pytorch distributed module. Integrating horovod in original_yolox and running it on A100 working fine, but getting mentioned error when I run with gaudi2. post mentioned by you is using code yolox_gaudi2.

We support DDP for scaling in pytorch not horovod. Horovod is supported only on tensorflow. See here and here.