What is --enforce-eager

Rohit_Behera · July 18, 2024, 6:38pm

I am running vllm inference on gaudi 2 -model meta-llama/Meta-Llama-3-8B-Instruct --dtype float16 --max-num-seqs 2048 --block-size 128 . the model isnt loading without --enforce-eager tag . what does --enforce-eager do .

Sayantan_S · July 18, 2024, 8:33pm

“the model isnt loading” … can you describe it in more details, like is it crashing, or hanging, or not producing good results etc?

Sayantan_S · July 30, 2024, 6:38pm

Though --enforce-eager as a name seems to suggest it controls if its lazy or eager mode, it actually controls in HPU graph is used or not. This interpretation is in line with the original usage of the flag to use CUDA graph or not as mentioned here

You can see it in use here

Please check the second point of this section. HPU graphs might take more memory, so I suspect your model runs out of memory when you have HPU graphs, but are able to run with --enforce-eager disabling HPU graphs.

Sayantan_S · July 30, 2024, 10:17pm

The meaning of the flag might get updated. for example there was a recent change here, whose description shows a more detailed table of the usage of enforce_eager in more detail:

Topic		Replies	Views
Synapse detected a device critical error that requires a restart. [Compute or dma timeout] PyTorch pytorch	0	94	November 12, 2024
A question about how to use "wrap_in_hpu_graph" Inference pytorch	3	660	April 25, 2023
Current best inference server implementation for Gaudi2 Inference models , performance , pytorch	3	449	January 2, 2025
Tensors taking time to shift from HPU to CPU Inference pytorch	2	124	July 9, 2024
PyTorch model works on CPU/CUDA but not on HPU Training pytorch	5	1742	January 19, 2022

What is --enforce-eager

Related topics