Why is there no hello-world level tutorial for using the Gaudi chip?

I’ve never used Gaudi-series accelerators before, and I’d like to see what they can do.

… and by “what they can do”, I don’t mean “How I can use them to do AI with PyTorch”.

I want to start by writing a simple program with a bit of code which runs on the Gaudi processor - using its computational primitives, or something which easily translates to a few of those primitives. For a GPU, this might be something like launching a kernel which reads some data from GPU-global memory, performs arithmetic on it (e.g. elementwise addition of vectors), and writes the result back. For something like Gaudi - perhaps a single multiplication of a pair of small matrices? Or even something simpler. I don’t want to simulate any neural network layers, no language models large or small; and no complex multi-layered frameworks with virtual environments - just the simplest you can imagine. Bottom floor.

I couldn’t find something like that, although, granted, I have not spent many hours looking. Could one of the kind forum denizens direct me to something relevant?

Quoting from this link:
The compute architecture is heterogeneous and includes two compute engines – a Matrix Multiplication Engine (MME) and a fully programmable Tensor Processor Core (TPC) cluster. The MME is responsible for doing all operations which can be lowered to Matrix Multiplication (fully connected layers, convolutions, batched-GEMM) while the TPC, a VLIW SIMD processor tailor-made for deep learning operations, is used to accelerate everything else.

So, to work with Gaudi, you can use pytorch. But you are looking for more finer grained control. You can program the TPC using these resources:

Right now as far as I can tell, there is no way to program both MME/TPC directly. For a flavor of what the synapse APIs look like you can look at this simple test code, which creates and launches a synapse graph:

The APIs look something like:

However this is possibly an old, very out-of-date unsupported version of synapse.

First - thank you for these references, and I will scrutinize them (when I’m back at work after holiday) carefully :slight_smile: - I hope they

I will already ask a couple of clarification question though…

For a flavor of what the synapse APIs look like

That sentence is the first time you mention these APIs. What are these APIs supposed to let us do? Is it how we can control/schedule work on/program the MME?

Right now as far as I can tell, there is no way to program both MME/TPC directly

So, the “custom kernels” is a way to program the TPC’s directly, right? So, are you saying I can’t program the MME at all myself, or is it just that I can’t synchronize the work of TPCs and the MME?

Synapse APIs:
The synapse_api.h file that I provided has the function signature and brief description of what it does. It has many functions such as Stream, event, data copy, query, tensor, node and graph creation apis.
You can create/compile/launch a graph using these APIs. The graph will be compiled and work will be allocated/scheduled to the TPC/MME.

Yes custom_kernels are a way to program TPCs directly. You cannot program MME directly, or control the work distribution between TPC/MME directly