Also, came up with more useful benchmarks that actually measure the
enqueue/dequeue operations for various simple workflows. For retained
graphs, time-of-flight from submission to execution is ~1us per job,
indicating job granularity should be >20us for retained jobs. For
dynamic jobs, where we need to pay the cost of allocation, a granularity
of ~100+ us may be advised.
Signed-off-by: Jeremy Ong <jcong@amazon.com>