Out of order addressing

jia3xu · March 13, 2023, 4:43pm

brief : we are writing a TPC kernel, we want to use the vector as the subscripts of the array of a tensor to get the value of the tensor out of order.

details:
we want to port this function ( which is a subfunction in cuda) to a TPC kernel.
the function snippet shows below:

float3 xyz_unit = apply_contraction(
    xyz, roi_min, roi_max, type);
int idx = grid_idx_at(xyz_unit, grid_res);
return grid_value[idx];

this runs in a cuda thread, but we want to run parallel in TPC kernel, we need to change the type of idx to int64, which cannot be used as a subscript of an array or tensor.
My question is do we have any function or mechanism to make it possible to get a vector of value by using int64 vector as the address offset which is NOT continuously?

zzhang37 · March 20, 2023, 8:11pm

We can’t do out of ordering in a single kernel, but it works if the sub-script are stored to a tensor and then loaded back in a succeeding kernel.
Basically in the first kernel, you save the index to a tensor. Then in the second kernel, use gen_addr to create the index coordinates and load tensor using the coordinates you just created, for example,

__global__ int* idx_coord_ptr_1 = gen_addr(idx_coord_1, indices_tensor);
in_coord_1[1] = s_i32_ld_g(idx_coord_ptr_1);

You can get some details at Habana_Custom_Kernel/sparse_lengths_sum_bf16_2D_f32_embed.c at main · HabanaAI/Habana_Custom_Kernel · GitHub
Thanks

Topic		Replies	Views
How to convert a vector type into scalar? General Questions	1	536	March 13, 2023
How can we utilize local memory in TPC? General Questions performance	3	552	April 20, 2023
Does Gaudi support CUDA? FAQ models	2	2372	April 23, 2024
How to broadcast each element in float64 into a single vector FAQ pytorch	1	268	December 6, 2023
How graph compiler determines the execution order operations on TPC and MME General Questions	3	489	July 17, 2023

Out of order addressing

Related topics