Intel Gaudi Developer Community
How can someone monitor Gaudi device load balancing during a distributed training with Horovod on multiple Gaudi nodes
System Setup
FAQ
Greg_S
June 30, 2021, 7:03am
1
User can use the
hl-smi
tool. For details, please see the
System Management Tools Guide
Related topics
Topic
Replies
Views
Activity
Is there any way to check the health/status of Gaudi cards? mem usage, processes running, etc
FAQ
0
932
June 30, 2021
Hugging Face Transformers using all 8 Habana Gaudi Devices
PyTorch
4
1385
July 7, 2022
Habana Gaudi Hpus Training time improvement
TensorFlow
2
670
September 30, 2022
Can I use tf.distribute for multi-gaudi training instead of horovod & mpirun
FAQ
scaling
0
704
June 30, 2021
Gaudi2 slower compared to A100
Training
10
672
June 7, 2023