When to use htcode.mark_step()

Dear community,

I am currently working on a project where I am attempting to convert the YOLOv8 code to run on Habana. I am having difficulty understanding the proper usage of the htcode.mark_step() function. From the documentation and basic examples provided on the Habana website, it appears that this function is called after the following actions:

  • loss.backward()
  • optimizer.step()
  • Computing the loss

I have also been referencing a patch for YOLOv5 created by some of your engineers on GitHub (applied patch by FrancescoSaverioZuppichini · Pull Request #10827 · ultralytics/yolov5 · GitHub) to try and understand the logic behind the APIs. However, I have noticed that the mask_step function is called randomly throughout the code, with little to no comments or explanations provided.

Furthermore, I am unsure of the purpose of the opt.run_lazy_mode variable and when it is necessary to call htcode.mark_step().

I would greatly appreciate any guidance or clarification on when and why one should call htcode.mark_step().

Thank you for your time and assistance.

Best regards,

Fra

Thanks for posting. Please refer to this doc for places to place mark_step

here is a longer answer for some more context:
You can run Pytorch on HPUs in “lazy” or “eager” mode. In eager mode, each op is run as soon as it is encountered. This mode is meant for debugging, and is not very performant. For performance, we need “lazy” mode, where ops are collected till some event triggers execution, when they are compiled and run. Examples of events triggering execution are mark_step and printing out or using the actual value of a tensor.

So that’s the use of mark_step: trigger execution for all ops that are collected till now. You can add mark_step anywhere in the code really, but adding too many of them breaks the graph up, preventing effecting optimization on the whole bigger graph. However there are some places where mark_step is compulsory, which are mentioned in the document linked above.

Other than the compulsory places where we need to put mark_step, we may put mark_step in other places for certain optimization purposes such as this example.

Thanks for your reply, I am familiar with the doc section about mark_steps but as it seems is not really all you need to do.

From your reply, it looks like when not using the eager mode, mark_steps has to be placed almost randomly around without following any guide/logic. So, how do I know where I should place it to develop efficiently on Habana? Is there any internal doc is not shared to the public yet?

Thanks a lot

Regd this: the doc section about mark_steps but as it seems is not really all you need to do:
That is all you need to do (mark_steps in those 2 places after loss backward and optimizer). And you should get a running model which might perform just fine (especially if its a static model). In past releases there was a requirement to add mark_step after data transfer, but now we dont have that, so in some older code you might see mark_step after inp_data.to(device).

You might see other mark_steps in many places in some models, because they have been optimized even more. One example of optimization is the link I provided in the last answer.

Looking at the yolov5 example specifically, you will notice a mark_step between these 2 lines, in train.py

pred = model(imgs) 
loss, loss_items = compute_loss(...)

This is because the loss section has dynamic shape and we break it up from the main model due to the reasons/optimizations mentioned in the link. In fact in this case, we run the compute_loss on CPU (refer to the variables run_loss_cpu)

In short:

  1. We need mark_step in the 2 places mentioned in the doc
  2. Older versions of code may have a mark_step after data transfer (which we dont need now)
  3. There may be more mark_steps in a model, but that is applied for some optimization purposes (especially dynamic shapes related). The dynamic shapes doc provides more context for that

Thanks: copy that, makes a lot of sense. My model is running now :slight_smile: