brief : we are writing a TPC kernel, we want to use the vector as the subscripts of the array of a tensor to get the value of the tensor out of order.
details:
we want to port this function ( which is a subfunction in cuda) to a TPC kernel.
the function snippet shows below:
float3 xyz_unit = apply_contraction(
xyz, roi_min, roi_max, type);
int idx = grid_idx_at(xyz_unit, grid_res);
return grid_value[idx];
this runs in a cuda thread, but we want to run parallel in TPC kernel, we need to change the type of idx to int64, which cannot be used as a subscript of an array or tensor.
My question is do we have any function or mechanism to make it possible to get a vector of value by using int64 vector as the address offset which is NOT continuously?