Fixes mistakes in GPU routine, but results in cuda out of memory

4 jobs for master
in 8 seconds, using 0 compute credits, and was queued for 3 seconds