I am training a variety of models on a dl1 AWS instance. One thing that I noticed was that when using hl-smi the utilization peaks at about mid-40% (using a single core, I use the -l 1 flag so I see utilization as it changes, and it never surpasses 50%). The training speed matched about what I got on a V100. This also happens with simple loops where there is no data transfer, so it doesn’t seem that there is something else holding back performance. This is all with the latest software installed.
Should I be seeing near 100% utilization on the used core on hl-smi and I am only getting half the performance I could? Or is what I’m seeing expected?
Thanks!