Intel Gaudi Developer Community
How can someone monitor Gaudi device load balancing during a distributed training with Horovod on multiple Gaudi nodes
System Setup
FAQ
Greg_S
June 30, 2021, 7:03am
1
User can use the
hl-smi
tool. For details, please see the
System Management Tools Guide
Related topics
Topic
Replies
Views
Activity
Hugging Face Transformers using all 8 Habana Gaudi Devices
PyTorch
4
1394
July 7, 2022
Gaudi2 slower compared to A100
Training
10
690
June 7, 2023
Is there any way to check the health/status of Gaudi cards? mem usage, processes running, etc
FAQ
0
939
June 30, 2021
Habana Gaudi Hpus Training time improvement
TensorFlow
2
676
September 30, 2022
Gaudi2 Mlperf v2.1 multi node support
Training
1
487
June 13, 2023