Intel Gaudi Developer Community
How can someone monitor Gaudi device load balancing during a distributed training with Horovod on multiple Gaudi nodes
System Setup
FAQ
Greg_S
June 30, 2021, 7:03am
1
User can use the
hl-smi
tool. For details, please see the
System Management Tools Guide
Related topics
Topic
Replies
Views
Activity
Is there any way to check the health/status of Gaudi cards? mem usage, processes running, etc
FAQ
0
848
June 30, 2021
multi-node training with horovod failing with Synpase error but ports are online
Training
1
509
June 6, 2023
How to use hccl with horovod?
Training
1
453
June 13, 2023
How to install all necessary libraries/software to write and debug Gaudi-oriented code on a local PC?
System Setup
1
514
December 13, 2022
Hugging Face Transformers using all 8 Habana Gaudi Devices
PyTorch
4
1303
July 7, 2022