We are writing a TPC kernel and the behavior likes this:
float64 t_mid
float64 far
while (t_mid < far){
//doing computation
}
clearly that the code above cannot compile successfully, but we want to find a way that can stop the while loop when all elements in the t_mid vector >= the corresponding element in the far vector.
We can’t convert a vector to a scalar. If we want to do a logic like yours, we always use predicate.
Almost all our intrinsic have predicate, which means all the instructions can be executed or not based on predicate.
For example,
You can create a condition based on your two vectors, t_mid, far
float64 conds = v_f32_sel_less_f32_b(t_mid, far, 0, 1);// here is less (leg is less and equal
depending on the values of t_mid and far, conds will contain either 0 or 1
Then convert to bool265 condition for predicate
int64 one = 0;
bool256 pred = from_bool64(v_f32_cmp_eq_b(conds, (float64) one));
After that, you can do all the operations, like add, move, mul etc with predicate
out = v_f32_mov_vb(a, 0, out, to_bool64(pred),0); //depending on predicate, the data in out vector will be assigned to a if predicate is TRUE, keep original value if FALSE
out = v_f32_add_vb(a, b, 0, out, to_bool64(pred),0); //depending on predicate, the data in out vector will be a+b if predicate is TRUE, or keep original value if FALSE
You can check the TPC Intrinsics Guide — Gaudi Documentation for all the intrinsic.