About the Training category
|
|
0
|
888
|
December 21, 2020
|
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Lowering thread
|
|
3
|
200
|
March 23, 2025
|
RuntimeError: Input sizes must be equal when doing loss.backward() during the training of a GNN
|
|
3
|
72
|
March 20, 2025
|
Activation checkpointing modules with kwargs in forward
|
|
1
|
46
|
January 19, 2025
|
AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph
|
|
4
|
77
|
January 19, 2025
|
Hccl failure to be connected on two nodes with a simple script
|
|
1
|
66
|
January 2, 2025
|
Training of torch.nn.embedding failed: loss not decreasing
|
|
2
|
40
|
January 2, 2025
|
Transferring kNN results from CPU to HPU breaks back propagation
|
|
0
|
38
|
December 3, 2024
|
Synapse detected a device critical error that requires a restart. [Compute or dma timeout]
|
|
0
|
74
|
November 12, 2024
|
NotImplementedError: Could not run 'aten::_sparse_coo_tensor_with_dims_and_tensors' with arguments from the 'SparseHPU' backend
|
|
1
|
166
|
November 12, 2024
|
hl_qual resnet50 training failed when load resnet50DecoderConfig.ini running on Gaudi2D
|
|
0
|
50
|
November 12, 2024
|
GCNConv fails with normalization
|
|
0
|
71
|
November 5, 2024
|
AttributeError : 'HabanaParameterWrapper' object has no attribute 'change_device_placement'
|
|
6
|
107
|
October 23, 2024
|
AttributeError : 'HabanaParameterWrapper' object has no attribute 'change_device_placement'
|
|
1
|
53
|
September 24, 2024
|
VITS training got RuntimeError: MKL FFT doesn't support tensors of type: BFloat16
|
|
4
|
189
|
August 23, 2024
|
Issue running Llama2 pretraining using megatron deepspeed
|
|
2
|
114
|
August 1, 2024
|
PRELU RuntimeError for inputs more than 1 dimension
|
|
3
|
94
|
July 23, 2024
|
Problem to train the local dataset using Llama2 Fine-Tuning with Low-Rank Adaptations (LoRA) on Intel® Gaudi®2 AI Accelerator
|
|
3
|
97
|
July 22, 2024
|
Problem with training llama-3-70b with deepspeed
|
|
1
|
182
|
July 18, 2024
|
Hpu_backend not found on torch.compile
|
|
2
|
231
|
July 11, 2024
|
Does HPU can use torch.nn.utils.weight_norm or spectral_norm
|
|
0
|
109
|
June 7, 2024
|
Pytorch complex datatype
|
|
1
|
125
|
May 28, 2024
|
Does HPU support complex datatype in torch
|
|
1
|
167
|
May 28, 2024
|
Why is there no hello-world level tutorial for using the Gaudi chip?
|
|
3
|
322
|
April 23, 2024
|
Does Gaudi support CUDA?
|
|
2
|
2328
|
April 23, 2024
|
RuntimeError: No backend type associated with device type cpu
|
|
2
|
1142
|
April 19, 2024
|
Gaudi1 HPU doesn't support long?
|
|
11
|
301
|
April 4, 2024
|
SyncBatchNorm Error
|
|
5
|
317
|
March 21, 2024
|
Model.to device faile: "RuntimeError: synStatus=8 [Device not found] Device acquire failed."
|
|
3
|
582
|
March 13, 2024
|
Synapse detected a device critical error
|
|
3
|
793
|
December 21, 2023
|