Issue running Llama2 pretraining using megatron deepspeed

srinarayan · July 26, 2024, 8:33pm

I am following the documentation from Megatron deep speed fork to run llama2 7b FP8 training and i encounter this error
AssertionError: allreduce_gradients() is not valid when bfloat+pipeline_parallelism is enabled.

Sayantan_S · July 26, 2024, 8:38pm

Can you please provide the cmd line, release (1.16?) you are using.

srinarayan · August 1, 2024, 8:14pm

I am copy pasting the command from megatron deepspeed repo. This is the command i use:
HL_LLAMA_MODEL_SIZE=7 HL_NUM_NODES=1 HL_PP=1 HL_TP=1 HL_DP=8 HL_CKP_ACT=2 HL_SEQ_LEN=4096 HL_ZERO_STAGE=1 HL_USE_FAST_SOFTMAX=1 HL_MICRO_BATCH=1 HL_GRAD_ACCUM_DTYPE=bf16 HL_USE_TRANSFORMER_ENGINE=1 HL_USE_CACHE_FP8_WEIGHT_FWD=1 HL_USE_CACHE_FP8_WEIGHT=1 scripts/run_llama.sh

I am using 1.16.2

Topic		Replies	Views
RuntimeError: No backend type associated with device type cpu PyTorch	2	1278	April 19, 2024
Problem with training llama-3-70b with deepspeed Training pytorch	1	210	July 18, 2024
SynapseAI 1.15.0 Release Announcements	0	345	April 17, 2024
PyTorch model works on CPU/CUDA but not on HPU Training pytorch	5	1743	January 19, 2022
Problem to train the local dataset using Llama2 Fine-Tuning with Low-Rank Adaptations (LoRA) on Intel® Gaudi®2 AI Accelerator Training models , tensorflow , advisory	3	117	July 22, 2024

Issue running Llama2 pretraining using megatron deepspeed

Related topics