Our Habana Gaudi system (Supermicro) sees an idle load avg of ~12.
I’ve seen similar issues with Xilinx xrt drivers for U250 and U280. After reporting it, it was resolved.
I believe something similar is at play with the Habana kernel driver. Is it using polling rather than interrupts?
Kernel: Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-124-generic x86_64)
ii habanalabs-container-runtime 1.6.0-439 amd64 Habana Labs container runtime. Provides a modified version of runc allowing users to run GPU enabled containers.
ii habanalabs-dkms 1.6.0-439 all habanalabs driver in DKMS format.
ii habanalabs-firmware 1.6.0-439 amd64 Firmware package for Habana Labs processing accelerators
ii habanalabs-firmware-odm 1.1.0-614 amd64 Firmware ODM package for Habana Labs processing accelerators
ii habanalabs-firmware-tools 1.6.0-439 amd64 Habanalabs firmware tools package
ii habanalabs-graph 1.6.0-439 amd64 habanalabs graph compiler
ii habanalabs-qual 1.6.0-439 amd64 This package contains Habanalabs qualification package. It designed to assist server vendors to qualify their Goya based server on the production line.
ii habanalabs-thunk 1.6.0-439 all habanalabs thunk
ii habanatools 1.6.0-439 amd64 Habana Labs tools package
[ 5.686464] Kernel command line: BOOT_IMAGE=images/default-habana-image/vmlinuz initrd=images/defa
ult-habana-image/initrd console=tty0 console=ttyS0,115200n8 rd.blacklist=nouveau ip=10.128.0.33:10.128.0.2:10.128.1.254:255.255.254.0 BOOTIF=01-b8-ce-f6-ad-84-8c
[ 120.721964] habanalabs_en: loading driver, version: 1.6.0-3c06a7c
[ 121.670217] habanalabs: loading driver, version: 1.6.0-3c06a7c
[ 121.670421] habanalabs 0000:34:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.670522] habanalabs 0000:34:00.0: enabling device (0140 -> 0142)
[ 121.670546] habanalabs 0000:34:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.673718] habanalabs 0000:1a:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.673808] habanalabs 0000:1a:00.0: enabling device (0140 -> 0142)
[ 121.673828] habanalabs 0000:1a:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.673954] habanalabs 0000:33:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.674032] habanalabs 0000:33:00.0: enabling device (0140 -> 0142)
[ 121.674053] habanalabs 0000:33:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.674367] habanalabs 0000:19:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.674452] habanalabs 0000:19:00.0: enabling device (0140 -> 0142)
[ 121.674465] habanalabs 0000:19:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.677694] habanalabs 0000:b3:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.677794] habanalabs 0000:b3:00.0: enabling device (0140 -> 0142)
[ 121.677819] habanalabs 0000:b3:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.682745] habanalabs 0000:b4:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.682833] habanalabs 0000:b4:00.0: enabling device (0140 -> 0142)
[ 121.682851] habanalabs 0000:b4:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.682933] habanalabs 0000:cd:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.683015] habanalabs 0000:cd:00.0: enabling device (0140 -> 0142)
[ 121.683038] habanalabs 0000:cd:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.683331] habanalabs 0000:cc:00.0: habanalabs device found [1da3:1000] (rev 1)
[ 121.683413] habanalabs 0000:cc:00.0: enabling device (0140 -> 0142)
[ 121.683425] habanalabs 0000:cc:00.0: PCI INT A: no GSI - using ISA IRQ 11
[ 121.782654] habanalabs hl3: Loading firmware to device, may take some time...
[ 121.782666] habanalabs hl0: Loading firmware to device, may take some time...
[ 121.783004] habanalabs hl2: Loading firmware to device, may take some time...
[ 121.783086] habanalabs hl1: Loading firmware to device, may take some time...
[ 121.810415] habanalabs hl7: Loading firmware to device, may take some time...
[ 121.823809] habanalabs hl6: Loading firmware to device, may take some time...
[ 121.823819] habanalabs hl5: Loading firmware to device, may take some time...
[ 121.823828] habanalabs hl4: Loading firmware to device, may take some time...
[ 121.849780] habanalabs hl0: BTL version 81608d8d
[ 121.849782] habanalabs hl0: preboot version 32.3.5-sec-4
[ 121.870246] habanalabs hl3: BTL version 81608d8d
[ 121.870248] habanalabs hl3: preboot version 32.3.5-sec-4
[ 121.881785] habanalabs hl7: BTL version 81608d8d
[ 121.881787] habanalabs hl7: preboot version 32.3.5-sec-4
[ 121.891063] habanalabs hl2: BTL version 81608d8d
[ 121.891065] habanalabs hl2: preboot version 32.3.5-sec-4
[ 121.909783] habanalabs hl5: BTL version 81608d8d
[ 121.909785] habanalabs hl5: preboot version 32.3.5-sec-4
[ 121.911549] habanalabs hl1: BTL version 81608d8d
[ 121.911551] habanalabs hl1: preboot version 32.3.5-sec-4
[ 121.930248] habanalabs hl6: BTL version 81608d8d
[ 121.930250] habanalabs hl6: preboot version 32.3.5-sec-4
[ 121.951108] habanalabs hl4: BTL version 81608d8d
[ 121.951110] habanalabs hl4: preboot version 32.3.5-sec-4
[ 129.903679] habanalabs hl1: boot-fit version 32.6.3-sec-4
[ 129.904363] habanalabs hl3: boot-fit version 32.6.3-sec-4
[ 129.905045] habanalabs hl0: boot-fit version 32.6.3-sec-4
[ 129.905732] habanalabs hl2: boot-fit version 32.6.3-sec-4
[ 129.937375] habanalabs hl6: boot-fit version 32.6.3-sec-4
[ 129.938319] habanalabs hl5: boot-fit version 32.6.3-sec-4
[ 129.939261] habanalabs hl7: boot-fit version 32.6.3-sec-4
[ 129.939946] habanalabs hl4: boot-fit version 32.6.3-sec-4
[ 131.080082] habanalabs hl1: Successfully loaded firmware to device
[ 131.080951] habanalabs hl3: Successfully loaded firmware to device
[ 131.081790] habanalabs hl0: Successfully loaded firmware to device
[ 131.082615] habanalabs hl2: Successfully loaded firmware to device
[ 131.109653] habanalabs hl5: Successfully loaded firmware to device
[ 131.110521] habanalabs hl4: Successfully loaded firmware to device
[ 131.117644] habanalabs hl7: Successfully loaded firmware to device
[ 131.118487] habanalabs hl6: Successfully loaded firmware to device
[ 133.669654] habanalabs hl3: Linux version 32.6.3-sec-4
[ 133.686671] habanalabs hl0: Linux version 32.6.3-sec-4
[ 133.703671] habanalabs hl6: Linux version 32.6.3-sec-4
[ 133.705658] habanalabs hl5: Linux version 32.6.3-sec-4
[ 133.709182] habanalabs hl1: Linux version 32.6.3-sec-4
[ 133.716062] habanalabs hl2: Linux version 32.6.3-sec-4
[ 133.722654] habanalabs hl4: Linux version 32.6.3-sec-4
[ 133.733648] habanalabs hl3: Found GAUDI device with 32GB DRAM
[ 133.738658] habanalabs hl7: Linux version 32.6.3-sec-4
[ 133.742187] habanalabs hl0: Found GAUDI device with 32GB DRAM
[ 133.758654] habanalabs hl1: Found GAUDI device with 32GB DRAM
[ 133.759191] habanalabs hl5: Found GAUDI device with 32GB DRAM
[ 133.762667] habanalabs hl6: Found GAUDI device with 32GB DRAM
[ 133.773649] habanalabs hl4: Found GAUDI device with 32GB DRAM
[ 133.780678] habanalabs hl2: Found GAUDI device with 32GB DRAM
[ 133.799666] habanalabs hl7: Found GAUDI device with 32GB DRAM
[ 134.858468] habanalabs 0000:34:00.0 enp52s0d1: renamed from eth0
[ 134.871814] habanalabs hl0: hwmon3: add sensors information
[ 134.871815] habanalabs hl0: Successfully added device to habanalabs driver
[ 134.902462] habanalabs 0000:34:00.0 enp52s0d8: renamed from eth1
[ 134.941984] habanalabs 0000:34:00.0 enp52s0d9: renamed from eth2
[ 134.974676] habanalabs 0000:cd:00.0 enp205s0d1: renamed from eth0
[ 134.982797] habanalabs hl1: hwmon4: add sensors information
[ 134.982798] habanalabs hl1: Successfully added device to habanalabs driver
[ 134.991859] habanalabs hl3: hwmon5: add sensors information
[ 134.991860] habanalabs hl3: Successfully added device to habanalabs driver
[ 134.993757] habanalabs hl6: hwmon6: add sensors information
[ 134.993759] habanalabs hl6: Successfully added device to habanalabs driver
[ 134.998297] habanalabs 0000:1a:00.0 ens2d1: renamed from eth4
[ 135.038029] habanalabs 0000:19:00.0 ens1d1: renamed from eth3
[ 135.061860] habanalabs 0000:cd:00.0 enp205s0d8: renamed from eth5
[ 135.093850] habanalabs 0000:19:00.0 ens1d8: renamed from eth2
[ 135.134469] habanalabs 0000:1a:00.0 ens2d9: renamed from eth10
[ 135.143561] habanalabs hl5: hwmon7: add sensors information
[ 135.143562] habanalabs hl5: Successfully added device to habanalabs driver
[ 135.147801] habanalabs hl2: hwmon8: add sensors information
[ 135.147803] habanalabs hl2: Successfully added device to habanalabs driver
[ 135.165983] habanalabs 0000:33:00.0 enp51s0d8: renamed from eth11
[ 135.193971] habanalabs 0000:b4:00.0 enp180s0d1: renamed from eth1
[ 135.206746] habanalabs hl4: hwmon9: add sensors information
[ 135.206747] habanalabs hl4: Successfully added device to habanalabs driver
[ 135.225925] habanalabs 0000:cd:00.0 enp205s0d9: renamed from eth8
[ 135.273777] habanalabs 0000:1a:00.0 ens2d8: renamed from eth7
[ 135.301850] habanalabs 0000:19:00.0 ens1d9: renamed from eth9
[ 135.333766] habanalabs 0000:33:00.0 enp51s0d1: renamed from eth0
[ 135.366012] habanalabs 0000:33:00.0 enp51s0d9: renamed from eth2
[ 135.397989] habanalabs 0000:b4:00.0 enp180s0d9: renamed from eth3
[ 135.408708] habanalabs hl7: hwmon10: add sensors information
[ 135.408709] habanalabs hl7: Successfully added device to habanalabs driver
[ 135.433801] habanalabs 0000:cc:00.0 enp204s0d1: renamed from eth13
[ 135.469783] habanalabs 0000:b4:00.0 enp180s0d8: renamed from eth12
[ 135.501891] habanalabs 0000:cc:00.0 enp204s0d9: renamed from eth0
[ 135.537728] habanalabs 0000:cc:00.0 enp204s0d8: renamed from eth1
[ 135.573969] habanalabs 0000:b3:00.0 enp179s0d1: renamed from eth6
[ 135.613747] habanalabs 0000:b3:00.0 enp179s0d9: renamed from eth5
[ 135.657835] habanalabs 0000:b3:00.0 enp179s0d8: renamed from eth4
[ 137.065649] habanalabs hl0: link up, port 3
[ 137.069649] habanalabs hl4: link up, port 3
[ 137.129651] habanalabs hl2: link up, port 3
[ 137.129656] habanalabs hl2: link up, port 6
[ 137.161654] habanalabs hl5: link up, port 5
[ 137.193648] habanalabs hl5: link up, port 6
[ 137.225632] habanalabs hl5: link up, port 3
[ 137.257647] habanalabs hl2: link up, port 5
[ 137.353633] habanalabs hl6: link up, port 5
[ 137.353636] habanalabs hl4: link up, port 5
[ 137.385644] habanalabs hl7: link up, port 5
[ 137.421642] habanalabs hl1: link up, port 4
[ 137.421647] habanalabs hl3: link up, port 5
[ 137.453642] habanalabs hl3: link up, port 4
[ 137.485642] habanalabs hl1: link up, port 5
[ 137.485644] habanalabs hl5: link up, port 4
[ 137.485649] habanalabs hl0: link up, port 5
[ 137.485651] habanalabs hl4: link up, port 6
[ 137.517643] habanalabs hl4: link up, port 4
[ 137.549640] habanalabs hl0: link up, port 6
[ 137.613641] habanalabs hl0: link up, port 2
[ 137.645641] habanalabs hl1: link up, port 3
[ 137.645644] habanalabs hl7: link up, port 3
[ 137.673639] habanalabs hl1: link up, port 0
[ 137.677643] habanalabs hl5: link up, port 0
[ 137.705640] habanalabs hl6: link up, port 3
[ 137.801643] habanalabs hl1: link up, port 2
[ 137.805630] habanalabs hl5: link up, port 7
[ 137.805633] habanalabs hl6: link up, port 0
[ 137.833642] habanalabs hl7: link up, port 6
[ 137.833656] habanalabs hl3: link up, port 3
[ 137.833661] habanalabs hl3: link up, port 6
[ 137.865652] habanalabs hl0: link up, port 7
[ 137.897634] habanalabs hl1: link up, port 6
[ 137.929647] habanalabs hl3: link up, port 7
[ 137.929652] habanalabs hl2: link up, port 4
[ 137.961640] habanalabs hl1: link up, port 7
[ 137.961644] habanalabs hl4: link up, port 2
[ 137.961649] habanalabs hl6: link up, port 6
[ 137.993685] habanalabs hl2: link up, port 7
[ 137.993688] habanalabs hl2: link up, port 0
[ 138.029639] habanalabs hl3: link up, port 2
[ 138.121637] habanalabs hl0: link up, port 0
[ 138.125642] habanalabs hl7: link up, port 7
[ 138.217641] habanalabs hl7: link up, port 4
[ 138.253642] habanalabs hl2: link up, port 2
[ 138.281636] habanalabs hl4: link up, port 0
[ 138.345642] habanalabs hl7: link up, port 2
[ 138.377672] habanalabs hl3: link up, port 0
[ 138.537671] habanalabs hl6: link up, port 2
[ 138.569672] habanalabs hl7: link up, port 0
[ 138.601671] habanalabs hl6: link up, port 4
[ 138.601674] habanalabs hl5: link up, port 2
[ 138.857650] habanalabs hl0: link up, port 4
[ 138.921647] habanalabs hl6: link up, port 7
[ 138.921654] habanalabs hl4: link up, port 7
Brgds,
Tor