How graph compiler determines the execution order operations on TPC and MME

From the profiling figure, we find execution order of operations on TPC and MME are not determined by code order if operations have data dependency. I think graph compiler tries to improve the utilization of MME so graph compiler will change execution order operations.

Question1:
But what are rules graph compiler uses to determine the execution order of operations?

Question2:
Are there program rules for users to improve the utilization of both MME and TPC.

The rules graph compiler uses for MME/TPC ordering are complex and might change from release to release. There are no knobs from user (pytorch) code to control this.

Firstly, thank you so much for your reply.

Is any general rule graph compiler of the current version used to determine the execution order of operations?

These are internal synapse graph compiler implementations and are not open sourced