Questions regarding the architecture about Habana Gaudi

Baixi_Sun · December 8, 2022, 8:01pm

Hello, I want to understand the architecture of Haban Gaudi and I read the documents (Welcome to Habana® Gaudi® v1.7 Documentation — Gaudi Documentation). I have some questions:

Question 1: I read in the documentation that the Habana Gaudi TPC contains Load, SPU, VPU and Store. For each TPC, how many VPUs and SPUs?

Question 2: How many units/PEs in the GEMM engine (MME)? Does MME have a similar architecture to the GPU tensor core? I find there is not much info about MME engine design.

Question 3: Does the TPC-LLVM offload the GEMM operations (like convolutions, fully-connected layers) onto the GEMM engine in the compilation or is there another compiler for MME? The name of “TPC-LLVM” makes me think it’s only for TPC rather than MME. Is there someone can clarify?

Question 4: Will the computations run in GEMM and TPC in parallel (overlapped)?

zzhang37 · December 9, 2022, 11:30pm

Question 3: Does the TPC-LLVM offload the GEMM operations (like convolutions, fully-connected layers) onto the GEMM engine in the compilation or is there another compiler for MME? The name of “TPC-LLVM” makes me think it’s only for TPC rather than MME. Is there someone can clarify?
(Answer): TPC-llvm is solely for TPC application, it can have all kinds of ops, including con2d and non-linear ops. But most matrix multiplication are done in MME, through our Synapse API, graph compiler.

Question 4: Will the computations run in GEMM and TPC in parallel (overlapped)?
(Answer): Yes, TPC and MME are independent hardware cores, they operate independently and compute in parallelly.

Baixi_Sun · December 13, 2022, 5:01am

Thank you so much for the information!

Baixi_Sun · December 15, 2022, 5:57pm

Hello! I am trying to understand the architecture of GEMM engine.

Question 1: For each execution unit core, what is the size of an instruction?

Question 2: How can users configure the GEMM engine? I see in this document (Gaudi Architecture — Gaudi Documentation) that the GEMM is configurable. Could you please provide more details about how configurable works?

Question 3: Is the graph compiler code for GEMM engine available? I mean, is there a GEMM engine compiler API that I can directly use to compile code?

Thank you!

zzhang37 · December 16, 2022, 5:40pm

Question 2: How can users configure the GEMM engine? I see in this document (Gaudi Architecture — Gaudi Documentation ) that the GEMM is configurable. Could you please provide more details about how configurable works?
(Answer): The configurable is in the sense of the Synapse API can config MME, which do all kinds of computation. Not for user-wise. User can’t directly interact with MME.
Question 3: Is the graph compiler code for GEMM engine available? I mean, is there a GEMM engine compiler API that I can directly use to compile code?
(Answer): Only Synapse API (graph compiler) can interact with MME, not user. This is different than TPC, which like a DSP processor, user can use TPC-C language to write kernel codes directly.

Baixi_Sun · December 21, 2022, 5:43pm

Thank you so much for the helpful information!

Question 1: Could you please provide me with the MME clock speed?

Question 2: Is configurable MME means that the graph compiler can configure which operation on MME and which on TPC?

Question 3: How is the sparse matrix multiplication (SpMM) optimized on the MME?

Question 4: I observed that fp16 is not supported for the GEMM operation. Could you please tell me the architectural difference between NVIDIA GPU Tensor cores and the MME EU cores?

Thank you!

zzhang37 · December 22, 2022, 8:36pm

Question 2: Is configurable MME means that the graph compiler can configure which operation on MME and which on TPC?
(Answer): That is correct. Most matrix multiplcation will go to MME and non-linear ops go to TPC.
Question 3: How is the sparse matrix multiplication (SpMM) optimized on the MME?
(Answer): Our MME doesn’t support SPMM. We could have TPC kernel does the job.

Thanks.

Baixi_Sun · January 31, 2023, 5:45pm

hello, we have some questions regarding programming on Habana gaudi1.

Is any low-level API (in C or C++) for us to call MME? Or we have to call PyTorch API like torch.mm to enable the operations on MME.
Can TPC directly interact with MME? (for example, directly call MME from TPC).
If there is no low-level API for us to call MME, how can we map desired operations into the MME. For example, in Nvidia GPU there are specific APIs for us to call tensor core. We want to try similar APIs like those.
We find a source code to program the MME (hl-thunk/gaudi_mme_conv.c at 77a59c35d284d2f987c7266e7db7f6d6bd08568b · HabanaAI/hl-thunk · GitHub), but we do not understand its logic. How to set up MME if we want to write a function running on it?

Thank you!

zzhang37 · February 2, 2023, 4:51am

Is any low-level API (in C or C++) for us to call MME? Or we have to call PyTorch API like torch.mm to enable the operations on MME.
(Answer): We don’t expose the low-level API to call MME. The way to enable MME is through framework, like torch.mm etc.
Can TPC directly interact with MME? (for example, directly call MME from TPC).
(Answer): No. TPC can’t call MME directly. The graph compiler in Synapse will redirect the operations either to MME or TPC.
If there is no low-level API for us to call MME, how can we map desired operations into the MME. For example, in Nvidia GPU there are specific APIs for us to call tensor core. We want to try similar APIs like those.
(Answer): Gaudi architectures are different than Nvidia GPU, user can’t directly interact with MME, MME related operations are controlled by Synapse.
We find a source code to program the MME, but we do not understand its logic. How to set up MME if we want to write a function running on it?
(Answer): All the exposed APIs are documented here, APIs — Gaudi Documentation,
That piece of codes are for hl-thunk test purpose, not recommend to use it as a template.

Thanks

Vasily · February 22, 2023, 8:09pm

Hi, I was wondering if you can share similar details about MMEs on Gaudi2. How many MACs they can do per cycle?

zzhang37 · February 23, 2023, 7:51pm

Here are some specs about Gaudi2 MME,
Support BF16,FP32, TF32, FP16, FP8.

Topic		Replies	Views
Questions about Gaudi 2 General Questions	1	547	March 14, 2023
Why is there no hello-world level tutorial for using the Gaudi chip? Training	3	340	April 23, 2024
About MME profiling result with Intel Gaudi Software General Questions	2	163	October 9, 2024
Does Gaudi support CUDA? FAQ models	2	2364	April 23, 2024
Current best inference server implementation for Gaudi2 Inference models , performance , pytorch	3	448	January 2, 2025

Questions regarding the architecture about Habana Gaudi

Related topics