Fix output_dims in Compute() and add threads_load and threads_compute

2 jobs for dev-add_compute
in 6 minutes and 5 seconds, using 0 compute credits, and was queued for 17 minutes and 17 seconds