What should peak utilization of a core be?

paris · December 22, 2022, 7:16am

I am training a variety of models on a dl1 AWS instance. One thing that I noticed was that when using hl-smi the utilization peaks at about mid-40% (using a single core, I use the -l 1 flag so I see utilization as it changes, and it never surpasses 50%). The training speed matched about what I got on a V100. This also happens with simple loops where there is no data transfer, so it doesn’t seem that there is something else holding back performance. This is all with the latest software installed.

Should I be seeing near 100% utilization on the used core on hl-smi and I am only getting half the performance I could? Or is what I’m seeing expected?

Thanks!

Sayantan_S · January 3, 2023, 6:47pm

Could you mention what model you are trying, and if you are trying 1card or 8 card?

For performance analysis/optimization, I’d suggest looking at the profiling tools such as:
synapse profiling. This profiling is more reliable metric of how the device is performing than the hl-smi number.
You can also look at some common performance tips and tricks here

paris · January 4, 2023, 11:57pm

This is a custom model, basically a bunch of 1D convolutions (either 1x1, or 5-long) with skip connections. This is running on a single core.

Topic		Replies	Views
Training of PyTorch Efficientnet seems extremely slow Training	8	1477	August 23, 2022
I'm running Habana's models and I don't see the same level of performance as what is published on the GitHub and Developer site FAQ performance	0	673	June 30, 2021
How can someone monitor Gaudi device load balancing during a distributed training with Horovod on multiple Gaudi nodes FAQ	0	831	June 30, 2021
Trainer killed/Segfault PyTorch	6	629	September 1, 2023
Gaudi2 slower compared to A100 Training	10	649	June 7, 2023

What should peak utilization of a core be?

Related topics