RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread

I’m getting the same error:

RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread...
[Rank:0] FATAL ERROR :: MODULE:PT_EAGER HabanaLaunchOpPT Run returned exception....
Graph compile failed. synStatus=synStatus 26 [Generic failure].
[Rank:0] Habana exception raised from compile at graph.cpp:599
[Rank:0] Habana exception raised from LaunchRecipe at graph_exec.cpp:558

In my case the the following cases were attempted:

  1. Attempt to run the model with PT_HPU_LAZY_MODE=0, torch.compile, and model.eval: (fails with above error)
  2. Attempt to run the model with torch.compile and model.eval: hpu_backend is not available
  3. Attempt to run the model with PT_HPU_LAZY_MODE=0, and model.eval: works fine.
  4. Attempt to run the model with model.eval: works fine
  5. Attempt to run the model with PT_HPU_LAZY_MODE=0, torch.compile, and model.train: fails with the above error (graph_exec.cpp:558)
  6. Attempt to run the model with torch.compile and model.train: Invalid backend
  7. Attempt to run the model with PT_HPU_LAZY_MODE=0 and model.train: works fine
  8. Attempt to run the model with model.train: works fine.

So I think basically an issue with torch.compile on this specific model.

Here’s the config:

============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
 PT_HPU_EAGER_PIPELINE_ENABLE = 1
 PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
1 Like