From the profiling figure, we find execution order of operations on TPC and MME are not determined by code order if operations have data dependency. I think graph compiler tries to improve the utilization of MME so graph compiler will change execution order operations.
Question1:
But what are rules graph compiler uses to determine the execution order of operations?
Question2:
Are there program rules for users to improve the utilization of both MME and TPC.