Synapse detected a device critical error

the result of execution with LOG_LEVEL_ALL=4 ENABLE_CONSOLE=true is below:

$ python3 example.py 
2023-12-19 07:52:03.667422: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-19 07:52:03.708135: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] + ---------------------------------------------------------------------- +
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] | Version:            1.11.0                                             |
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] | Synapse:            2402698                                            |
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] | HCL:                ab04ac9                                            |
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] | MME:                57aebd3                                            |
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] | SCAL:               832fb38                                            |
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] | Description:        HabanaLabs Runtime and GraphCompiler               |
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] | Time:               2023-12-19 07:52:05.648777                         |
[2023-12-19 07:52:05.648] [SYN_API] [info] [tid:125] + ---------------------------------------------------------------------- +
2023-12-19 07:52:06.826336: I /home/jenkins/workspace/cdsoftwarebuilder/create-tensorflow-module---bpt-d/repos/tensorflow-training/habana_device/habana_device_binding_iface.cpp:62] Found TensorFlow library with SHA256: 6313068adb101b8805221c224fdfe66b3a10bc9d044139aa1c7ea305b8b9f0b4
[2023-12-19 07:52:07.530] [HCL] [info] [tid:125] Version:	1.11.0-ab04ac9
[2023-12-19 07:52:08.966] [SYN_API] [critical] [tid:125] DFA detected, see separate file for details device_fail_analysis_log.txt
[2023-12-19 07:52:08.966] [SYN_API] [error] [tid:125] DFA detected, see separate file for details device_fail_analysis_log.txt
[2023-12-19 07:52:08.967] [DFA_API_INFO] [info] [tid:125] tid 293 #DFA begin 1702972328966532898
[2023-12-19 07:52:09.468] [DFA_API_INFO] [debug] [tid:125] =================================== STREAM SYNC HISTORY (chronological order) START ====================================
[2023-12-19 07:52:09.468] [DFA_API_INFO] [debug] [tid:125]                    type                  : stream : physical     : physical :   target (wait)   : function/line
[2023-12-19 07:52:09.468] [DFA_API_INFO] [debug] [tid:125]                                          : ID     : StreamOffset : QueuesId :                   :              
[2023-12-19 07:52:09.468] [DFA_API_INFO] [debug] [tid:125] ------------------------------------------------------------------------------------------------------------------------
[07:52:05.623739][SYN_API     ][debug][tid:125] synInitialize: SYN_API_CALL 
[07:52:05.624183][SYN_API     ][debug][tid:125] shim layer initialized successfully
[07:52:05.648712][SYN_API     ][info ][tid:125] + ---------------------------------------------------------------------- +
[07:52:05.648733][SYN_API     ][info ][tid:125] | Version:            1.11.0                                             |
[07:52:05.648744][SYN_API     ][info ][tid:125] | Synapse:            2402698                                            |
[07:52:05.648752][SYN_API     ][info ][tid:125] | HCL:                ab04ac9                                            |
[07:52:05.648759][SYN_API     ][info ][tid:125] | MME:                57aebd3                                            |
[07:52:05.648767][SYN_API     ][info ][tid:125] | SCAL:               832fb38                                            |
[07:52:05.648776][SYN_API     ][info ][tid:125] | Description:        HabanaLabs Runtime and GraphCompiler               |
[07:52:05.648853][SYN_API     ][info ][tid:125] | Time:               2023-12-19 07:52:05.648839                         |
[07:52:05.648862][SYN_API     ][info ][tid:125] + ---------------------------------------------------------------------- +
[07:52:06.712313][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:06.712326][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:06.712329][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:06.712333][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:06.712344][SYN_API     ][debug][tid:125] synDeviceGetCount: SYN_API_CALL 
[07:52:06.713017][SYN_API     ][debug][tid:125] synDestroy: SYN_API_CALL 
[07:52:06.716599][SYN_API     ][debug][tid:125] dfa: setting DfaPhase::NONE
[07:52:06.720834][SYN_API     ][info ][tid:125] Synapse API's enter/exit counters:
Function: synConfigurationSet                               , Entry counter:          4, Exit counter:          4
Function: synDeviceGetCount                                 , Entry counter:          1, Exit counter:          1

[07:52:07.062282][SYN_API     ][debug][tid:125] synInitialize: SYN_API_CALL 
[07:52:07.062335][SYN_API     ][debug][tid:125] shim layer initialized successfully
[07:52:07.088469][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:07.088478][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:07.088479][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:07.088483][SYN_API     ][debug][tid:125] synConfigurationSet: SYN_API_CALL 
[07:52:07.088496][SYN_API     ][debug][tid:125] synDeviceGetCountByDeviceType: SYN_API_CALL deviceType 2[synDeviceGaudi]
[07:52:07.089299][SYN_API     ][debug][tid:125] synDeviceAcquireByDeviceType: SYN_API_CALL deviceType 2[synDeviceGaudi]
[07:52:07.514636][SYN_PROGRESS][trace][tid:125] 4[INTERNAL_STREAM_TYPE_DMA_DOWN_SYNAPSE] :      4 :            1 :        1 :        2          : memCpyAsync/800
[07:52:07.526315][SYN_PROGRESS][trace][tid:125] 4[INTERNAL_STREAM_TYPE_DMA_DOWN_SYNAPSE] :      4 :            1 :        1 :        7          : memCpyAsync/800
[07:52:07.530246][HCL_API     ][info ][tid:125] hcclInitDevice: 
[07:52:07.530380][SYN_API     ][debug][tid:125] synDeviceGetInfo: SYN_API_CALL deviceId 0
[07:52:07.530547][HCL_API     ][info ][tid:125] initDevice - CPU pinning Done
[07:52:07.530566][HCL_API     ][info ][tid:125] HclApi::initDevice HclApi::createHcl device type = 2[synDeviceGaudi]
[07:52:07.992682][SYN_API     ][debug][tid:125] synDeviceMalloc: SYN_API_CALL deviceId 0 size 0x1e000000 requestedAddress 0x0 flags 0x0
[07:52:08.005359][HCL_API     ][info ][tid:125] HCL_Init_Device - Done
[07:52:08.022187][SYN_API     ][debug][tid:125] synStreamCreateGeneric: SYN_API_CALL deviceId 0  flags 0x0
[07:52:08.022248][SYN_API     ][debug][tid:125] synStreamCreateGeneric: SYN_API_CALL deviceId 0  flags 0x0
[07:52:08.022269][SYN_API     ][debug][tid:125] synStreamCreateGeneric: SYN_API_CALL deviceId 0  flags 0x0
[07:52:08.022350][SYN_API     ][debug][tid:125] synStreamCreateGeneric: SYN_API_CALL deviceId 0  flags 0x0
[07:52:08.022356][SYN_API     ][debug][tid:125] synDeviceGetMemoryInfo: SYN_API_CALL deviceId 0
[07:52:08.022415][SYN_API     ][debug][tid:125] synDeviceMalloc: SYN_API_CALL deviceId 0 size 0x782a3e000 requestedAddress 0x0 flags 0x0
[07:52:08.457146][SYN_API     ][debug][tid:125] synDeviceGetAttribute: SYN_API_CALL querySize 1 deviceId 0
[07:52:08.457155][SYN_API     ][debug][tid:125] synDeviceGetMemoryInfo: SYN_API_CALL deviceId 0
[07:52:08.460408][SYN_API     ][debug][tid:125] synHostMap: SYN_API_CALL deviceId 0 size 0x80000000 buffer 0x7ffe0fe00000
[07:52:08.966517][SYN_API     ][critical][tid:125] DFA detected, see separate file for details device_fail_analysis_log.txt
[07:52:08.966531][SYN_API     ][error][tid:125] DFA detected, see separate file for details device_fail_analysis_log.txt
[07:52:08.967260][SYN_API     ][info ][tid:125] dfa: sending DfaPhase::STARTED
[07:52:08.967262][SYN_API     ][debug][tid:125] setting dfaPhase to 1
[07:52:08.967283][HCL_API     ][info ][tid:125] hcclDfaUpdateState: update Dfa phase to 1[STARTED]
[07:52:08.967285][HCL_API     ][info ][tid:125] Dfa state change. Old 0[NONE] new 1[STARTED]
[07:52:09.469232][DFA_API_INFO][debug][tid:125] =============================================== STREAM SYNC HISTORY END ================================================
Synapse detected a device critical error that requires a restart. Killing process in 5 seconds (hl: 6) 07:52:08 [please check log files for dfa cause]
Killed

And details of the device_fail_analysis_log is below:

[07:52:08.466238][SYN_DEV_FAIL][error][tid:125] hl thunk API failed, errCode memcopyIoctlFailed
[07:52:08.966538][SYN_DEV_FAIL][info ][tid:125] tid 293 #DFA begin 1702972328966532898
[07:52:08.967192][SYN_DEV_FAIL][error][tid:125] 


======================================== DFA triggered on the following errors =========================================

An hl-thunk call failed
Most likely device was reset. Please check dmesg.
Instead, you can check a copy of the dmesg: dmesg_copy.txt

========================================================================================================================



[07:52:08.967216][SYN_DEV_FAIL][trace][tid:125] ===================================================== Failure Info =====================================================
[07:52:08.967224][SYN_DEV_FAIL][trace][tid:125] --- Errors detected ---
[07:52:08.967248][SYN_DEV_FAIL][error][tid:125] #DFA reason: memcopyIoctlFailed   (code 0x200)
[07:52:08.967254][SYN_DEV_FAIL][error][tid:125] 

[07:52:08.967294][SYN_DEV_FAIL][info ][tid:125] + ---------------------------------------------------------------------- +
[07:52:08.967302][SYN_DEV_FAIL][info ][tid:125] | Version:            1.11.0                                             |
[07:52:08.967310][SYN_DEV_FAIL][info ][tid:125] | Synapse:            2402698                                            |
[07:52:08.967317][SYN_DEV_FAIL][info ][tid:125] | HCL:                ab04ac9                                            |
[07:52:08.967323][SYN_DEV_FAIL][info ][tid:125] | MME:                57aebd3                                            |
[07:52:08.967329][SYN_DEV_FAIL][info ][tid:125] | SCAL:               832fb38                                            |
[07:52:08.967336][SYN_DEV_FAIL][info ][tid:125] | Description:        Habana Labs Device failure analysis                |
[07:52:08.967384][SYN_DEV_FAIL][info ][tid:125] | Time:               2023-12-19 07:52:08.967340                         |
[07:52:08.967391][SYN_DEV_FAIL][info ][tid:125] + ---------------------------------------------------------------------- +
[07:52:08.967398][SYN_DEV_FAIL][info ][tid:125] Failure occurred on device:
[07:52:08.967403][SYN_DEV_FAIL][info ][tid:125]      #device name:     hl6
[07:52:08.967407][SYN_DEV_FAIL][info ][tid:125]      Moudle index:       0
[07:52:08.967413][SYN_DEV_FAIL][info ][tid:125]      fd compute/control: 16/15
[07:52:08.967418][SYN_DEV_FAIL][info ][tid:125]      #global rank Id:    ---
[07:52:08.967424][SYN_DEV_FAIL][info ][tid:125] #device type GAUDI
[07:52:08.967429][SYN_DEV_FAIL][info ][tid:125] #is simulator  No
[07:52:08.967447][SYN_DEV_FAIL][info ][tid:125] #acquire time: 2023-12-19 07:52:07.110819
[07:52:08.967452][SYN_DEV_FAIL][info ][tid:125] pci addr:      0000:34:00.0
[07:52:08.967461][SYN_DEV_FAIL][trace][tid:125] ==================================================== Engines Status ====================================================
[07:52:08.967719][SYN_DEV_FAIL][trace][tid:125] actualSize of engine dump 1720