I’m getting the same error:
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread...
[Rank:0] FATAL ERROR :: MODULE:PT_EAGER HabanaLaunchOpPT Run returned exception....
Graph compile failed. synStatus=synStatus 26 [Generic failure].
[Rank:0] Habana exception raised from compile at graph.cpp:599
[Rank:0] Habana exception raised from LaunchRecipe at graph_exec.cpp:558
In my case the the following cases were attempted:
- Attempt to run the model with PT_HPU_LAZY_MODE=0, torch.compile, and model.eval: (fails with above error)
- Attempt to run the model with torch.compile and model.eval: hpu_backend is not available
- Attempt to run the model with PT_HPU_LAZY_MODE=0, and model.eval: works fine.
- Attempt to run the model with model.eval: works fine
- Attempt to run the model with PT_HPU_LAZY_MODE=0, torch.compile, and model.train: fails with the above error (graph_exec.cpp:558)
- Attempt to run the model with torch.compile and model.train: Invalid backend
- Attempt to run the model with PT_HPU_LAZY_MODE=0 and model.train: works fine
- Attempt to run the model with model.train: works fine.
So I think basically an issue with torch.compile on this specific model.
Here’s the config:
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
PT_HPU_EAGER_PIPELINE_ENABLE = 1
PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1